Skip to content

Commit acd4f07

Browse files
authored
Create a doc for versioning info (#113601)
1 parent 71c252c commit acd4f07

File tree

2 files changed

+302
-45
lines changed

2 files changed

+302
-45
lines changed

CONTRIBUTING.md

Lines changed: 5 additions & 45 deletions
Original file line numberDiff line numberDiff line change
@@ -660,51 +660,11 @@ node cannot continue to operate as a member of the cluster:
660660

661661
Errors like this should be very rare. When in doubt, prefer `WARN` to `ERROR`.
662662

663-
### Version numbers in the Elasticsearch codebase
664-
665-
Starting in 8.8.0, we have separated out the version number representations
666-
of various aspects of Elasticsearch into their own classes, using their own
667-
numbering scheme separate to release version. The main ones are
668-
`TransportVersion` and `IndexVersion`, representing the version of the
669-
inter-node binary protocol and index data + metadata respectively.
670-
671-
Separated version numbers are comprised of an integer number. The semantic
672-
meaning of a version number are defined within each `*Version` class. There
673-
is no direct mapping between separated version numbers and the release version.
674-
The versions used by any particular instance of Elasticsearch can be obtained
675-
by querying `/_nodes/info` on the node.
676-
677-
#### Using separated version numbers
678-
679-
Whenever a change is made to a component versioned using a separated version
680-
number, there are a few rules that need to be followed:
681-
682-
1. Each version number represents a specific modification to that component,
683-
and should not be modified once it is defined. Each version is immutable
684-
once merged into `main`.
685-
2. To create a new component version, add a new constant to the respective class
686-
with a descriptive name of the change being made. Increment the integer
687-
number according to the particular `*Version` class.
688-
689-
If your pull request has a conflict around your new version constant,
690-
you need to update your PR from `main` and change your PR to use the next
691-
available version number.
692-
693-
### Checking for cluster features
694-
695-
As part of developing a new feature or change, you might need to determine
696-
if all nodes in a cluster have been upgraded to support your new feature.
697-
This can be done using `FeatureService`. To define and check for a new
698-
feature in a cluster:
699-
700-
1. Define a new `NodeFeature` constant with a unique id for the feature
701-
in a class related to the change you're doing.
702-
2. Return that constant from an instance of `FeatureSpecification.getFeatures`,
703-
either an existing implementation or a new implementation. Make sure
704-
the implementation is added as an SPI implementation in `module-info.java`
705-
and `META-INF/services`.
706-
3. To check if all nodes in the cluster support the new feature, call
707-
`FeatureService.clusterHasFeature(ClusterState, NodeFeature)`
663+
### Versioning Elasticsearch
664+
665+
There are various concepts used to identify running node versions,
666+
and the capabilities and compatibility of those nodes. For more information,
667+
see `docs/internal/Versioning.md`
708668

709669
### Creating a distribution
710670

docs/internal/Versioning.md

Lines changed: 297 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,297 @@
1+
Versioning Elasticsearch
2+
========================
3+
4+
Elasticsearch is a complicated product, and is run in many different scenarios.
5+
A single version number is not sufficient to cover the whole of the product,
6+
instead we need different concepts to provide versioning capabilities
7+
for different aspects of Elasticsearch, depending on their scope, updatability,
8+
responsiveness, and maintenance.
9+
10+
## Release version
11+
12+
This is the version number used for published releases of Elasticsearch,
13+
and the Elastic stack. This takes the form _major.minor.patch_,
14+
with a corresponding version id.
15+
16+
Uses of this version number should be avoided, as it does not apply to
17+
some scenarios, and use of release version will break Elasticsearch nodes.
18+
19+
The release version is accessible in code through `Build.current().version()`,
20+
but it **should not** be assumed that this is a semantic version number,
21+
it could be any arbitrary string.
22+
23+
## Transport protocol
24+
25+
The transport protocol is used to send binary data between Elasticsearch nodes;
26+
`TransportVersion` is the version number used for this protocol.
27+
This version number is negotiated between each pair of nodes in the cluster
28+
on first connection, and is set as the lower of the highest transport version
29+
understood by each node.
30+
This version is then accessible through the `getTransportVersion` method
31+
on `StreamInput` and `StreamOutput`, so serialization code can read/write
32+
objects in a form that will be understood by the other node.
33+
34+
Every change to the transport protocol is represented by a new transport version,
35+
higher than all previous transport versions, which then becomes the highest version
36+
recognized by that build of Elasticsearch. The version ids are stored
37+
as constants in the `TransportVersions` class.
38+
Each id has a standard pattern `M_NNN_SS_P`, where:
39+
* `M` is the major version
40+
* `NNN` is an incrementing id
41+
* `SS` is used in subsidiary repos amending the default transport protocol
42+
* `P` is used for patches and backports
43+
44+
When you make a change to the serialization form of any object,
45+
you need to create a new sequential constant in `TransportVersions`,
46+
introduced in the same PR that adds the change, that increments
47+
the `NNN` component from the previous highest version,
48+
with other components set to zero.
49+
For example, if the previous version number is `8_413_00_1`,
50+
the next version number should be `8_414_00_0`.
51+
52+
Once you have defined your constant, you then need to use it
53+
in serialization code. If the transport version is at or above the new id,
54+
the modified protocol should be used:
55+
56+
str = in.readString();
57+
bool = in.readBoolean();
58+
if (in.getTransportVersion().onOrAfter(TransportVersions.NEW_CONSTANT)) {
59+
num = in.readVInt();
60+
}
61+
62+
If a transport version change needs to be reverted, a **new** version constant
63+
should be added representing the revert, and the version id checks
64+
adjusted appropriately to only use the modified protocol between the version id
65+
the change was added, and the new version id used for the revert (exclusive).
66+
The `between` method can be used for this.
67+
68+
Once a transport change with a new version has been merged into main or a release branch,
69+
it **must not** be modified - this is so the meaning of that specific
70+
transport version does not change.
71+
72+
_Elastic developers_ - please see corresponding documentation for Serverless
73+
on creating transport versions for Serverless changes.
74+
75+
### Collapsing transport versions
76+
77+
As each change adds a new constant, the list of constants in `TransportVersions`
78+
will keep growing. However, once there has been an official release of Elasticsearch,
79+
that includes that change, that specific transport version is no longer needed,
80+
apart from constants that happen to be used for release builds.
81+
As part of managing transport versions, consecutive transport versions can be
82+
periodically collapsed together into those that are only used for release builds.
83+
This task is normally performed by Core/Infra on a semi-regular basis,
84+
usually after each new minor release, to collapse the transport versions
85+
for the previous minor release. An example of such an operation can be found
86+
[here](https://github.com/elastic/elasticsearch/pull/104937).
87+
88+
### Minimum compatibility versions
89+
90+
The transport version used between two nodes is determined by the initial handshake
91+
(see `TransportHandshaker`, where the two nodes swap their highest known transport version).
92+
The lowest transport version that is compatible with the current node
93+
is determined by `TransportVersions.MINIMUM_COMPATIBLE`,
94+
and the node is prevented from joining the cluster if it is below that version.
95+
This constant should be updated manually on a major release.
96+
97+
The minimum version that can be used for CCS is determined by
98+
`TransportVersions.MINIMUM_CCS_VERSION`, but this is not actively checked
99+
before queries are performed. Only if a query cannot be serialized at that
100+
version is an action rejected. This constant is updated automatically
101+
as part of performing a release.
102+
103+
### Mapping to release versions
104+
105+
For releases that do use a version number, it can be confusing to encounter
106+
a log or exception message that references an arbitrary transport version,
107+
where you don't know which release version that corresponds to. This is where
108+
the `.toReleaseVersion()` method comes in. It uses metadata stored in a csv file
109+
(`TransportVersions.csv`) to map from the transport version id to the corresponding
110+
release version. For any transport versions it encounters without a direct map,
111+
it performs a best guess based on the information it has. The csv file
112+
is updated automatically as part of performing a release.
113+
114+
In releases that do not have a release version number, that method becomes
115+
a no-op.
116+
117+
### Managing patches and backports
118+
119+
Backporting transport version changes to previous releases
120+
should only be done if absolutely necessary, as it is very easy to get wrong
121+
and break the release in a way that is very hard to recover from.
122+
123+
If we consider the version number as an incrementing line, what we are doing is
124+
grafting a change that takes effect at a certain point in the line,
125+
to additionally take effect in a fixed window earlier in the line.
126+
127+
To take an example, using indicative version numbers, when the latest
128+
transport version is 52, we decide we need to backport a change done in
129+
transport version 50 to transport version 45. We use the `P` version id component
130+
to create version 45.1 with the backported change.
131+
This change will apply for version ids 45.1 to 45.9 (should they exist in the future).
132+
133+
The serialization code in the backport needs to use the backported protocol
134+
for all version numbers 45.1 to 45.9. The `TransportVersion.isPatchFrom` method
135+
can be used to easily determine if this is the case: `streamVersion.isPatchFrom(45.1)`.
136+
However, the `onOrAfter` also does what is needed on patch branches.
137+
138+
The serialization code in version 53 then needs to additionally check
139+
version numbers 45.1-45.9 to use the backported protocol, also using the `isPatchFrom` method.
140+
141+
As an example, [this transport change](https://github.com/elastic/elasticsearch/pull/107862)
142+
was backported from 8.15 to [8.14.0](https://github.com/elastic/elasticsearch/pull/108251)
143+
and [8.13.4](https://github.com/elastic/elasticsearch/pull/108250) at the same time
144+
(8.14 was a build candidate at the time).
145+
146+
The 8.13 PR has:
147+
148+
if (transportVersion.onOrAfter(8.13_backport_id))
149+
150+
The 8.14 PR has:
151+
152+
if (transportVersion.isPatchFrom(8.13_backport_id)
153+
|| transportVersion.onOrAfter(8.14_backport_id))
154+
155+
The 8.15 PR has:
156+
157+
if (transportVersion.isPatchFrom(8.13_backport_id)
158+
|| transportVersion.isPatchFrom(8.14_backport_id)
159+
|| transportVersion.onOrAfter(8.15_transport_id))
160+
161+
In particular, if you are backporting a change to a patch release,
162+
you also need to make sure that any subsequent released version on any branch
163+
also has that change, and knows about the patch backport ids and what they mean.
164+
165+
## Index version
166+
167+
Index version is a single incrementing version number for the index data format,
168+
metadata, and associated mappings. It is declared the same way as the
169+
transport version - with the pattern `M_NNN_SS_P`, for the major version, version id,
170+
subsidiary version id, and patch number respectively.
171+
172+
Index version is stored in index metadata when an index is created,
173+
and it is used to determine the storage format and what functionality that index supports.
174+
The index version does not change once an index is created.
175+
176+
In the same way as transport versions, when a change is needed to the index
177+
data format or metadata, or new mapping types are added, create a new version constant
178+
below the last one, incrementing the `NNN` version component.
179+
180+
Unlike transport version, version constants cannot be collapsed together,
181+
as an index keeps its creation version id once it is created.
182+
Fortunately, new index versions are only created once a month or so,
183+
so we don’t have a large list of index versions that need managing.
184+
185+
Similar to transport version, index version has a `toReleaseVersion` to map
186+
onto release versions, in appropriate situations.
187+
188+
## Cluster Features
189+
190+
Cluster features are identifiers, published by a node in cluster state,
191+
indicating they support a particular top-level operation or set of functionality.
192+
They are used for internal checks within Elasticsearch, and for gating tests
193+
on certain functionality. For example, to check all nodes have upgraded
194+
to a certain point before running a large migration operation to a new data format.
195+
Cluster features should not be referenced by anything outside the Elasticsearch codebase.
196+
197+
Cluster features are indicative of top-level functionality introduced to
198+
Elasticsearch - e.g. a new transport endpoint, or new operations.
199+
200+
It is also used to check nodes can join a cluster - once all nodes in a cluster
201+
support a particular feature, no nodes can then join the cluster that do not
202+
support that feature. This is to ensure that once a feature is supported
203+
by a cluster, it will then always be supported in the future.
204+
205+
To declare a new cluster feature, add an implementation of the `FeatureSpecification` SPI,
206+
suitably registered (or use an existing one for your code area), and add the feature
207+
as a constant to be returned by getFeatures. To then check whether all nodes
208+
in the cluster support that feature, use the method `clusterHasFeature` on `FeatureService`.
209+
It is only possible to check whether all nodes in the cluster have a feature;
210+
individual node checks should not be done.
211+
212+
Once a cluster feature is declared and deployed, it cannot be modified or removed,
213+
else new nodes will not be able to join existing clusters.
214+
If functionality represented by a cluster feature needs to be removed,
215+
a new cluster feature should be added indicating that functionality is no longer
216+
supported, and the code modified accordingly (bearing in mind additional BwC constraints).
217+
218+
The cluster features infrastructure is only designed to support a few hundred features
219+
per major release, and once features are added to a cluster they can not be removed.
220+
Cluster features should therefore be used sparingly.
221+
Adding too many cluster features risks increasing cluster instability.
222+
223+
When we release a new major version N, we limit our backwards compatibility
224+
to the highest minor of the previous major N-1. Therefore, any cluster formed
225+
with the new major version is guaranteed to have all features introduced during
226+
releases of major N-1. All such features can be deemed to be met by the cluster,
227+
and the features themselves can be removed from cluster state over time,
228+
and the feature checks removed from the code of major version N.
229+
230+
### Testing
231+
232+
Tests often want to check if a certain feature is implemented / available on all nodes,
233+
particularly BwC or mixed cluster test.
234+
235+
Rather than introducing a production feature just for a test condition,
236+
this can be done by adding a _test feature_ in an implementation of
237+
`FeatureSpecification.getTestFeatures`. These features will only be set
238+
on clusters running as part of an integration test. Even so, cluster features
239+
should be used sparingly if possible; Capabilities is generally a better
240+
option for test conditions.
241+
242+
In Java Rest tests, checking cluster features can be done using
243+
`ESRestTestCase.clusterHasFeature(feature)`
244+
245+
In YAML Rest tests, conditions can be defined in the `requires` or `skip` sections
246+
that use cluster features; see [here](https://github.com/elastic/elasticsearch/blob/main/rest-api-spec/src/yamlRestTest/resources/rest-api-spec/test/README.asciidoc#skipping-tests) for more information.
247+
248+
To aid with backwards compatibility tests, the test framework adds synthetic features
249+
for each previously released Elasticsearch version, of the form `gte_v{VERSION}`
250+
(for example `gte_v8.14.2`).
251+
This can be used to add conditions based on previous releases. It _cannot_ be used
252+
to check the current snapshot version; real features or capabilities should be
253+
used instead.
254+
255+
## Capabilities
256+
257+
The Capabilities API is a REST API for external clients to check the capabilities
258+
of an Elasticsearch cluster. As it is dynamically calculated for every query,
259+
it is not limited in size or usage.
260+
261+
A capabilities query can be used to query for 3 things:
262+
* Is this endpoint supported for this HTTP method?
263+
* Are these parameters of this endpoint supported?
264+
* Are these capabilities (arbitrary string ids) of this endpoint supported?
265+
266+
The API will return with a simple true/false, indicating if all specified aspects
267+
of the endpoint are supported by all nodes in the cluster.
268+
If any aspect is not supported by any one node, the API returns `false`.
269+
270+
The API can also return `supported: null` (indicating unknown)
271+
if there was a problem communicating with one or more nodes in the cluster.
272+
273+
All registered endpoints automatically work with the endpoint existence check.
274+
To add support for parameter and feature capability queries to your REST endpoint,
275+
implement the `supportedQueryParameters` and `supportedCapabilities` methods in your rest handler.
276+
277+
To perform a capability query, perform a REST call to the `_capabilities` API,
278+
with parameters `method`, `path`, `parameters`, `capabilities`.
279+
The call will query every node in the cluster, and return `{supported: true}`
280+
if all nodes support that specific combination of method, path, query parameters,
281+
and endpoint capabilities. If any single aspect is not supported,
282+
the query will return `{supported: false}`. If there are any problems
283+
communicating with nodes in the cluster, the response will be `{supported: null}`
284+
indicating support or lack thereof cannot currently be determined.
285+
Capabilities can be checked using the clusterHasCapability method in ESRestTestCase.
286+
287+
Similar to cluster features, YAML tests can have skip and requires conditions
288+
specified with capabilities like the following:
289+
290+
- requires:
291+
capabilities:
292+
- method: GET
293+
path: /_endpoint
294+
parameters: [param1, param2]
295+
capabilities: [cap1, cap2]
296+
297+
method: GET is the default, and does not need to be explicitly specified.

0 commit comments

Comments
 (0)