Skip to content

Commit fdb1bfd

Browse files
committed
Merge remote-tracking branch 'elastic/master' into feature-aware-check
* elastic/master: [Tests] Muting RatedRequestsTests#testXContentParsingIsNotLenient TEST: Retry synced-flush if ongoing ops on primary (elastic#30978) Fix docs build. Only auto-update license signature if all nodes ready (elastic#30859) Add BlobContainer.writeBlobAtomic() (elastic#30902) Add a doc value format to binary fields. (elastic#30860) Take into account the return value of TcpTransport.readMessageLength(...) in Netty4SizeHeaderFrameDecoder Move caching of the size of a directory to `StoreDirectory`. (elastic#30581) Clarify docs about boolean operator precedence. (elastic#30808) Docs: remove notes on sparsity. (elastic#30905) Fix MatchPhrasePrefixQueryBuilderTests#testPhraseOnFieldWithNoTerms run overflow forecast a 2nd time as regression test for elastic/ml-cpp#110 (elastic#30969) Improve documentation of dynamic mappings. (elastic#30952) Decouple MultiValueMode. (elastic#31075) Docs: Clarify constraints on scripted similarities. (elastic#31076)
2 parents e6d9997 + 4624ba5 commit fdb1bfd

File tree

62 files changed

+1018
-546
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

62 files changed

+1018
-546
lines changed

buildSrc/src/main/resources/checkstyle_suppressions.xml

-2
Original file line numberDiff line numberDiff line change
@@ -505,8 +505,6 @@
505505
<suppress files="server[/\\]src[/\\]test[/\\]java[/\\]org[/\\]elasticsearch[/\\]cluster[/\\]settings[/\\]ClusterSettingsIT.java" checks="LineLength" />
506506
<suppress files="server[/\\]src[/\\]test[/\\]java[/\\]org[/\\]elasticsearch[/\\]cluster[/\\]shards[/\\]ClusterSearchShardsIT.java" checks="LineLength" />
507507
<suppress files="server[/\\]src[/\\]test[/\\]java[/\\]org[/\\]elasticsearch[/\\]cluster[/\\]structure[/\\]RoutingIteratorTests.java" checks="LineLength" />
508-
<suppress files="server[/\\]src[/\\]test[/\\]java[/\\]org[/\\]elasticsearch[/\\]common[/\\]blobstore[/\\]FsBlobStoreContainerTests.java" checks="LineLength" />
509-
<suppress files="server[/\\]src[/\\]test[/\\]java[/\\]org[/\\]elasticsearch[/\\]common[/\\]blobstore[/\\]FsBlobStoreTests.java" checks="LineLength" />
510508
<suppress files="server[/\\]src[/\\]test[/\\]java[/\\]org[/\\]elasticsearch[/\\]common[/\\]breaker[/\\]MemoryCircuitBreakerTests.java" checks="LineLength" />
511509
<suppress files="server[/\\]src[/\\]test[/\\]java[/\\]org[/\\]elasticsearch[/\\]common[/\\]geo[/\\]ShapeBuilderTests.java" checks="LineLength" />
512510
<suppress files="server[/\\]src[/\\]test[/\\]java[/\\]org[/\\]elasticsearch[/\\]common[/\\]hash[/\\]MessageDigestsTests.java" checks="LineLength" />

docs/reference/how-to/general.asciidoc

-91
Original file line numberDiff line numberDiff line change
@@ -40,94 +40,3 @@ better. For instance if a user searches for two words `foo` and `bar`, a match
4040
across different chapters is probably very poor, while a match within the same
4141
paragraph is likely good.
4242

43-
[float]
44-
[[sparsity]]
45-
=== Avoid sparsity
46-
47-
The data-structures behind Lucene, which Elasticsearch relies on in order to
48-
index and store data, work best with dense data, ie. when all documents have the
49-
same fields. This is especially true for fields that have norms enabled (which
50-
is the case for `text` fields by default) or doc values enabled (which is the
51-
case for numerics, `date`, `ip` and `keyword` by default).
52-
53-
The reason is that Lucene internally identifies documents with so-called doc
54-
ids, which are integers between 0 and the total number of documents in the
55-
index. These doc ids are used for communication between the internal APIs of
56-
Lucene: for instance searching on a term with a `match` query produces an
57-
iterator of doc ids, and these doc ids are then used to retrieve the value of
58-
the `norm` in order to compute a score for these documents. The way this `norm`
59-
lookup is implemented currently is by reserving one byte for each document.
60-
The `norm` value for a given doc id can then be retrieved by reading the
61-
byte at index `doc_id`. While this is very efficient and helps Lucene quickly
62-
have access to the `norm` values of every document, this has the drawback that
63-
documents that do not have a value will also require one byte of storage.
64-
65-
In practice, this means that if an index has `M` documents, norms will require
66-
`M` bytes of storage *per field*, even for fields that only appear in a small
67-
fraction of the documents of the index. Although slightly more complex with doc
68-
values due to the fact that doc values have multiple ways that they can be
69-
encoded depending on the type of field and on the actual data that the field
70-
stores, the problem is very similar. In case you wonder: `fielddata`, which was
71-
used in Elasticsearch pre-2.0 before being replaced with doc values, also
72-
suffered from this issue, except that the impact was only on the memory
73-
footprint since `fielddata` was not explicitly materialized on disk.
74-
75-
Note that even though the most notable impact of sparsity is on storage
76-
requirements, it also has an impact on indexing speed and search speed since
77-
these bytes for documents that do not have a field still need to be written
78-
at index time and skipped over at search time.
79-
80-
It is totally fine to have a minority of sparse fields in an index. But beware
81-
that if sparsity becomes the rule rather than the exception, then the index
82-
will not be as efficient as it could be.
83-
84-
This section mostly focused on `norms` and `doc values` because those are the
85-
two features that are most affected by sparsity. Sparsity also affect the
86-
efficiency of the inverted index (used to index `text`/`keyword` fields) and
87-
dimensional points (used to index `geo_point` and numerics) but to a lesser
88-
extent.
89-
90-
Here are some recommendations that can help avoid sparsity:
91-
92-
[float]
93-
==== Avoid putting unrelated data in the same index
94-
95-
You should avoid putting documents that have totally different structures into
96-
the same index in order to avoid sparsity. It is often better to put these
97-
documents into different indices, you could also consider giving fewer shards
98-
to these smaller indices since they will contain fewer documents overall.
99-
100-
Note that this advice does not apply in the case that you need to use
101-
parent/child relations between your documents since this feature is only
102-
supported on documents that live in the same index.
103-
104-
[float]
105-
==== Normalize document structures
106-
107-
Even if you really need to put different kinds of documents in the same index,
108-
maybe there are opportunities to reduce sparsity. For instance if all documents
109-
in the index have a timestamp field but some call it `timestamp` and others
110-
call it `creation_date`, it would help to rename it so that all documents have
111-
the same field name for the same data.
112-
113-
[float]
114-
==== Avoid types
115-
116-
Types might sound like a good way to store multiple tenants in a single index.
117-
They are not: given that types store everything in a single index, having
118-
multiple types that have different fields in a single index will also cause
119-
problems due to sparsity as described above. If your types do not have very
120-
similar mappings, you might want to consider moving them to a dedicated index.
121-
122-
[float]
123-
==== Disable `norms` and `doc_values` on sparse fields
124-
125-
If none of the above recommendations apply in your case, you might want to
126-
check whether you actually need `norms` and `doc_values` on your sparse fields.
127-
`norms` can be disabled if producing scores is not necessary on a field, this is
128-
typically true for fields that are only used for filtering. `doc_values` can be
129-
disabled on fields that are neither used for sorting nor for aggregations.
130-
Beware that this decision should not be made lightly since these parameters
131-
cannot be changed on a live index, so you would have to reindex if you realize
132-
that you need `norms` or `doc_values`.
133-

docs/reference/index-modules/similarity.asciidoc

+12-2
Original file line numberDiff line numberDiff line change
@@ -341,7 +341,18 @@ Which yields:
341341
// TESTRESPONSE[s/"took": 12/"took" : $body.took/]
342342
// TESTRESPONSE[s/OzrdjxNtQGaqs4DmioFw9A/$body.hits.hits.0._node/]
343343

344-
You might have noticed that a significant part of the script depends on
344+
WARNING: While scripted similarities provide a lot of flexibility, there is
345+
a set of rules that they need to satisfy. Failing to do so could make
346+
Elasticsearch silently return wrong top hits or fail with internal errors at
347+
search time:
348+
349+
- Returned scores must be positive.
350+
- All other variables remaining equal, scores must not decrease when
351+
`doc.freq` increases.
352+
- All other variables remaining equal, scores must not increase when
353+
`doc.length` increases.
354+
355+
You might have noticed that a significant part of the above script depends on
345356
statistics that are the same for every document. It is possible to make the
346357
above slightly more efficient by providing an `weight_script` which will
347358
compute the document-independent part of the score and will be available
@@ -506,7 +517,6 @@ GET /index/_search?explain=true
506517
507518
////////////////////
508519

509-
510520
Type name: `scripted`
511521

512522
[float]

docs/reference/mapping/dynamic/field-mapping.asciidoc

+1-1
Original file line numberDiff line numberDiff line change
@@ -135,6 +135,6 @@ PUT my_index/_doc/1
135135
}
136136
--------------------------------------------------
137137
// CONSOLE
138-
<1> The `my_float` field is added as a <<number,`double`>> field.
138+
<1> The `my_float` field is added as a <<number,`float`>> field.
139139
<2> The `my_integer` field is added as a <<number,`long`>> field.
140140

docs/reference/mapping/dynamic/templates.asciidoc

+16-5
Original file line numberDiff line numberDiff line change
@@ -46,11 +46,22 @@ name as an existing template, it will replace the old version.
4646
[[match-mapping-type]]
4747
==== `match_mapping_type`
4848

49-
The `match_mapping_type` matches on the datatype detected by
50-
<<dynamic-field-mapping,dynamic field mapping>>, in other words, the datatype
51-
that Elasticsearch thinks the field should have. Only the following datatypes
52-
can be automatically detected: `boolean`, `date`, `double`, `long`, `object`,
53-
`string`. It also accepts `*` to match all datatypes.
49+
The `match_mapping_type` is the datatype detected by the json parser. Since
50+
JSON doesn't allow to distinguish a `long` from an `integer` or a `double` from
51+
a `float`, it will always choose the wider datatype, ie. `long` for integers
52+
and `double` for floating-point numbers.
53+
54+
The following datatypes may be automatically detected:
55+
56+
- `boolean` when `true` or `false` are encountered.
57+
- `date` when <<date-detection,date detection>> is enabled and a string is
58+
found that matches any of the configured date formats.
59+
- `double` for numbers with a decimal part.
60+
- `long` for numbers without a decimal part.
61+
- `object` for objects, also called hashes.
62+
- `string` for character strings.
63+
64+
`*` may also be used in order to match all datatypes.
5465

5566
For example, if we wanted to map all integer fields as `integer` instead of
5667
`long`, and all `string` fields as both `text` and `keyword`, we

docs/reference/query-dsl/query-string-syntax.asciidoc

+4-21
Original file line numberDiff line numberDiff line change
@@ -233,26 +233,10 @@ states that:
233233
* `news` must not be present
234234
* `quick` and `brown` are optional -- their presence increases the relevance
235235

236-
The familiar operators `AND`, `OR` and `NOT` (also written `&&`, `||` and `!`)
237-
are also supported. However, the effects of these operators can be more
238-
complicated than is obvious at first glance. `NOT` takes precedence over
239-
`AND`, which takes precedence over `OR`. While the `+` and `-` only affect
240-
the term to the right of the operator, `AND` and `OR` can affect the terms to
241-
the left and right.
242-
243-
****
244-
Rewriting the above query using `AND`, `OR` and `NOT` demonstrates the
245-
complexity:
246-
247-
`quick OR brown AND fox AND NOT news`::
248-
249-
This is incorrect, because `brown` is now a required term.
250-
251-
`(quick OR brown) AND fox AND NOT news`::
252-
253-
This is incorrect because at least one of `quick` or `brown` is now required
254-
and the search for those terms would be scored differently from the original
255-
query.
236+
The familiar boolean operators `AND`, `OR` and `NOT` (also written `&&`, `||`
237+
and `!`) are also supported but beware that they do not honor the usual
238+
precedence rules, so parentheses should be used whenever multiple operators are
239+
used together. For instance the previous query could be rewritten as:
256240

257241
`((quick AND fox) OR (brown AND fox) OR fox) AND NOT news`::
258242

@@ -270,7 +254,6 @@ would look like this:
270254
}
271255
}
272256

273-
****
274257

275258
===== Grouping
276259

modules/aggs-matrix-stats/src/main/java/org/elasticsearch/search/aggregations/support/MultiValuesSource.java

+1-1
Original file line numberDiff line numberDiff line change
@@ -47,7 +47,7 @@ public NumericDoubleValues getField(final int ordinal, LeafReaderContext ctx) th
4747
if (ordinal > names.length) {
4848
throw new IndexOutOfBoundsException("ValuesSource array index " + ordinal + " out of bounds");
4949
}
50-
return multiValueMode.select(values[ordinal].doubleValues(ctx), Double.NEGATIVE_INFINITY);
50+
return multiValueMode.select(values[ordinal].doubleValues(ctx));
5151
}
5252
}
5353

modules/lang-expression/src/main/java/org/elasticsearch/script/expression/DateMethodValueSource.java

+1-1
Original file line numberDiff line numberDiff line change
@@ -54,7 +54,7 @@ class DateMethodValueSource extends FieldDataValueSource {
5454
public FunctionValues getValues(Map context, LeafReaderContext leaf) throws IOException {
5555
AtomicNumericFieldData leafData = (AtomicNumericFieldData) fieldData.load(leaf);
5656
final Calendar calendar = Calendar.getInstance(TimeZone.getTimeZone("UTC"), Locale.ROOT);
57-
NumericDoubleValues docValues = multiValueMode.select(leafData.getDoubleValues(), 0d);
57+
NumericDoubleValues docValues = multiValueMode.select(leafData.getDoubleValues());
5858
return new DoubleDocValues(this) {
5959
@Override
6060
public double doubleVal(int docId) throws IOException {

modules/lang-expression/src/main/java/org/elasticsearch/script/expression/DateObjectValueSource.java

+1-1
Original file line numberDiff line numberDiff line change
@@ -56,7 +56,7 @@ class DateObjectValueSource extends FieldDataValueSource {
5656
public FunctionValues getValues(Map context, LeafReaderContext leaf) throws IOException {
5757
AtomicNumericFieldData leafData = (AtomicNumericFieldData) fieldData.load(leaf);
5858
MutableDateTime joda = new MutableDateTime(0, DateTimeZone.UTC);
59-
NumericDoubleValues docValues = multiValueMode.select(leafData.getDoubleValues(), 0d);
59+
NumericDoubleValues docValues = multiValueMode.select(leafData.getDoubleValues());
6060
return new DoubleDocValues(this) {
6161
@Override
6262
public double doubleVal(int docId) throws IOException {

modules/lang-expression/src/main/java/org/elasticsearch/script/expression/FieldDataValueSource.java

+1-1
Original file line numberDiff line numberDiff line change
@@ -68,7 +68,7 @@ public int hashCode() {
6868
@SuppressWarnings("rawtypes") // ValueSource uses a rawtype
6969
public FunctionValues getValues(Map context, LeafReaderContext leaf) throws IOException {
7070
AtomicNumericFieldData leafData = (AtomicNumericFieldData) fieldData.load(leaf);
71-
NumericDoubleValues docValues = multiValueMode.select(leafData.getDoubleValues(), 0d);
71+
NumericDoubleValues docValues = multiValueMode.select(leafData.getDoubleValues());
7272
return new DoubleDocValues(this) {
7373
@Override
7474
public double doubleVal(int doc) throws IOException {

modules/rank-eval/src/test/java/org/elasticsearch/index/rankeval/RatedRequestsTests.java

+1
Original file line numberDiff line numberDiff line change
@@ -131,6 +131,7 @@ public void testXContentRoundtrip() throws IOException {
131131
}
132132
}
133133

134+
@AwaitsFix(bugUrl="https://github.com/elastic/elasticsearch/issues/31104")
134135
public void testXContentParsingIsNotLenient() throws IOException {
135136
RatedRequest testItem = createTestItem(randomBoolean());
136137
XContentType xContentType = randomFrom(XContentType.values());

modules/transport-netty4/src/main/java/org/elasticsearch/transport/netty4/Netty4SizeHeaderFrameDecoder.java

+11-9
Original file line numberDiff line numberDiff line change
@@ -37,17 +37,19 @@ final class Netty4SizeHeaderFrameDecoder extends ByteToMessageDecoder {
3737
protected void decode(ChannelHandlerContext ctx, ByteBuf in, List<Object> out) throws Exception {
3838
try {
3939
BytesReference networkBytes = Netty4Utils.toBytesReference(in);
40-
int messageLength = TcpTransport.readMessageLength(networkBytes) + HEADER_SIZE;
41-
// If the message length is -1, we have not read a complete header. If the message length is
42-
// greater than the network bytes available, we have not read a complete frame.
43-
if (messageLength != -1 && messageLength <= networkBytes.length()) {
44-
final ByteBuf message = in.skipBytes(HEADER_SIZE);
45-
// 6 bytes would mean it is a ping. And we should ignore.
46-
if (messageLength != 6) {
47-
out.add(message);
40+
int messageLength = TcpTransport.readMessageLength(networkBytes);
41+
// If the message length is -1, we have not read a complete header.
42+
if (messageLength != -1) {
43+
int messageLengthWithHeader = messageLength + HEADER_SIZE;
44+
// If the message length is greater than the network bytes available, we have not read a complete frame.
45+
if (messageLengthWithHeader <= networkBytes.length()) {
46+
final ByteBuf message = in.skipBytes(HEADER_SIZE);
47+
// 6 bytes would mean it is a ping. And we should ignore.
48+
if (messageLengthWithHeader != 6) {
49+
out.add(message);
50+
}
4851
}
4952
}
50-
5153
} catch (IllegalArgumentException ex) {
5254
throw new TooLongFrameException(ex);
5355
}

server/src/main/java/org/elasticsearch/common/blobstore/BlobContainer.java

+23
Original file line numberDiff line numberDiff line change
@@ -74,6 +74,29 @@ public interface BlobContainer {
7474
*/
7575
void writeBlob(String blobName, InputStream inputStream, long blobSize) throws IOException;
7676

77+
/**
78+
* Reads blob content from the input stream and writes it to the container in a new blob with the given name,
79+
* using an atomic write operation if the implementation supports it. When the BlobContainer implementation
80+
* does not provide a specific implementation of writeBlobAtomic(String, InputStream, long), then
81+
* the {@link #writeBlob(String, InputStream, long)} method is used.
82+
*
83+
* This method assumes the container does not already contain a blob of the same blobName. If a blob by the
84+
* same name already exists, the operation will fail and an {@link IOException} will be thrown.
85+
*
86+
* @param blobName
87+
* The name of the blob to write the contents of the input stream to.
88+
* @param inputStream
89+
* The input stream from which to retrieve the bytes to write to the blob.
90+
* @param blobSize
91+
* The size of the blob to be written, in bytes. It is implementation dependent whether
92+
* this value is used in writing the blob to the repository.
93+
* @throws FileAlreadyExistsException if a blob by the same name already exists
94+
* @throws IOException if the input stream could not be read, or the target blob could not be written to.
95+
*/
96+
default void writeBlobAtomic(final String blobName, final InputStream inputStream, final long blobSize) throws IOException {
97+
writeBlob(blobName, inputStream, blobSize);
98+
}
99+
77100
/**
78101
* Deletes a blob with giving name, if the blob exists. If the blob does not exist,
79102
* this method throws a NoSuchFileException.

server/src/main/java/org/elasticsearch/common/blobstore/fs/FsBlobContainer.java

+46-2
Original file line numberDiff line numberDiff line change
@@ -19,11 +19,12 @@
1919

2020
package org.elasticsearch.common.blobstore.fs;
2121

22-
import org.elasticsearch.core.internal.io.IOUtils;
22+
import org.elasticsearch.common.UUIDs;
2323
import org.elasticsearch.common.blobstore.BlobMetaData;
2424
import org.elasticsearch.common.blobstore.BlobPath;
2525
import org.elasticsearch.common.blobstore.support.AbstractBlobContainer;
2626
import org.elasticsearch.common.blobstore.support.PlainBlobMetaData;
27+
import org.elasticsearch.core.internal.io.IOUtils;
2728
import org.elasticsearch.core.internal.io.Streams;
2829

2930
import java.io.BufferedInputStream;
@@ -56,8 +57,9 @@
5657
*/
5758
public class FsBlobContainer extends AbstractBlobContainer {
5859

59-
protected final FsBlobStore blobStore;
60+
private static final String TEMP_FILE_PREFIX = "pending-";
6061

62+
protected final FsBlobStore blobStore;
6163
protected final Path path;
6264

6365
public FsBlobContainer(FsBlobStore blobStore, BlobPath blobPath, Path path) {
@@ -131,6 +133,48 @@ public void writeBlob(String blobName, InputStream inputStream, long blobSize) t
131133
IOUtils.fsync(path, true);
132134
}
133135

136+
@Override
137+
public void writeBlobAtomic(final String blobName, final InputStream inputStream, final long blobSize) throws IOException {
138+
final String tempBlob = tempBlobName(blobName);
139+
final Path tempBlobPath = path.resolve(tempBlob);
140+
try {
141+
try (OutputStream outputStream = Files.newOutputStream(tempBlobPath, StandardOpenOption.CREATE_NEW)) {
142+
Streams.copy(inputStream, outputStream);
143+
}
144+
IOUtils.fsync(tempBlobPath, false);
145+
146+
final Path blobPath = path.resolve(blobName);
147+
// If the target file exists then Files.move() behaviour is implementation specific
148+
// the existing file might be replaced or this method fails by throwing an IOException.
149+
if (Files.exists(blobPath)) {
150+
throw new FileAlreadyExistsException("blob [" + blobPath + "] already exists, cannot overwrite");
151+
}
152+
Files.move(tempBlobPath, blobPath, StandardCopyOption.ATOMIC_MOVE);
153+
} catch (IOException ex) {
154+
try {
155+
deleteBlobIgnoringIfNotExists(tempBlob);
156+
} catch (IOException e) {
157+
ex.addSuppressed(e);
158+
}
159+
throw ex;
160+
} finally {
161+
IOUtils.fsync(path, true);
162+
}
163+
}
164+
165+
public static String tempBlobName(final String blobName) {
166+
return "pending-" + blobName + "-" + UUIDs.randomBase64UUID();
167+
}
168+
169+
/**
170+
* Returns true if the blob is a leftover temporary blob.
171+
*
172+
* The temporary blobs might be left after failed atomic write operation.
173+
*/
174+
public static boolean isTempBlobName(final String blobName) {
175+
return blobName.startsWith(TEMP_FILE_PREFIX);
176+
}
177+
134178
@Override
135179
public void move(String source, String target) throws IOException {
136180
Path sourcePath = path.resolve(source);

server/src/main/java/org/elasticsearch/common/util/SingleObjectCache.java

+5
Original file line numberDiff line numberDiff line change
@@ -64,6 +64,11 @@ public T getOrRefresh() {
6464
return cached;
6565
}
6666

67+
/** Return the potentially stale cached entry. */
68+
protected final T getNoRefresh() {
69+
return cached;
70+
}
71+
6772
/**
6873
* Returns a new instance to cache
6974
*/

0 commit comments

Comments
 (0)