|
| 1 | +Versioning Elasticsearch |
| 2 | +======================== |
| 3 | + |
| 4 | +Elasticsearch is a complicated product, and is run in many different scenarios. |
| 5 | +A single version number is not sufficient to cover the whole of the product, |
| 6 | +instead we need different concepts to provide versioning capabilities |
| 7 | +for different aspects of Elasticsearch, depending on their scope, updatability, |
| 8 | +responsiveness, and maintenance. |
| 9 | + |
| 10 | +## Release version |
| 11 | + |
| 12 | +This is the version number used for published releases of Elasticsearch, |
| 13 | +and the Elastic stack. This takes the form _major.minor.patch_, |
| 14 | +with a corresponding version id. |
| 15 | + |
| 16 | +Uses of this version number should be avoided, as it does not apply to |
| 17 | +some scenarios, and use of release version will break Elasticsearch nodes. |
| 18 | + |
| 19 | +The release version is accessible in code through `Build.current().version()`, |
| 20 | +but it **should not** be assumed that this is a semantic version number, |
| 21 | +it could be any arbitrary string. |
| 22 | + |
| 23 | +## Transport protocol |
| 24 | + |
| 25 | +The transport protocol is used to send binary data between Elasticsearch nodes; |
| 26 | +`TransportVersion` is the version number used for this protocol. |
| 27 | +This version number is negotiated between each pair of nodes in the cluster |
| 28 | +on first connection, and is set as the lower of the highest transport version |
| 29 | +understood by each node. |
| 30 | +This version is then accessible through the `getTransportVersion` method |
| 31 | +on `StreamInput` and `StreamOutput`, so serialization code can read/write |
| 32 | +objects in a form that will be understood by the other node. |
| 33 | + |
| 34 | +Every change to the transport protocol is represented by a new transport version, |
| 35 | +higher than all previous transport versions, which then becomes the highest version |
| 36 | +recognized by that build of Elasticsearch. The version ids are stored |
| 37 | +as constants in the `TransportVersions` class. |
| 38 | +Each id has a standard pattern `M_NNN_SS_P`, where: |
| 39 | +* `M` is the major version |
| 40 | +* `NNN` is an incrementing id |
| 41 | +* `SS` is used in subsidiary repos amending the default transport protocol |
| 42 | +* `P` is used for patches and backports |
| 43 | + |
| 44 | +When you make a change to the serialization form of any object, |
| 45 | +you need to create a new sequential constant in `TransportVersions`, |
| 46 | +introduced in the same PR that adds the change, that increments |
| 47 | +the `NNN` component from the previous highest version, |
| 48 | +with other components set to zero. |
| 49 | +For example, if the previous version number is `8_413_00_1`, |
| 50 | +the next version number should be `8_414_00_0`. |
| 51 | + |
| 52 | +Once you have defined your constant, you then need to use it |
| 53 | +in serialization code. If the transport version is at or above the new id, |
| 54 | +the modified protocol should be used: |
| 55 | + |
| 56 | + str = in.readString(); |
| 57 | + bool = in.readBoolean(); |
| 58 | + if (in.getTransportVersion().onOrAfter(TransportVersions.NEW_CONSTANT)) { |
| 59 | + num = in.readVInt(); |
| 60 | + } |
| 61 | + |
| 62 | +If a transport version change needs to be reverted, a **new** version constant |
| 63 | +should be added representing the revert, and the version id checks |
| 64 | +adjusted appropriately to only use the modified protocol between the version id |
| 65 | +the change was added, and the new version id used for the revert (exclusive). |
| 66 | +The `between` method can be used for this. |
| 67 | + |
| 68 | +Once a transport change with a new version has been merged into main or a release branch, |
| 69 | +it **must not** be modified - this is so the meaning of that specific |
| 70 | +transport version does not change. |
| 71 | + |
| 72 | +_Elastic developers_ - please see corresponding documentation for Serverless |
| 73 | +on creating transport versions for Serverless changes. |
| 74 | + |
| 75 | +### Collapsing transport versions |
| 76 | + |
| 77 | +As each change adds a new constant, the list of constants in `TransportVersions` |
| 78 | +will keep growing. However, once there has been an official release of Elasticsearch, |
| 79 | +that includes that change, that specific transport version is no longer needed, |
| 80 | +apart from constants that happen to be used for release builds. |
| 81 | +As part of managing transport versions, consecutive transport versions can be |
| 82 | +periodically collapsed together into those that are only used for release builds. |
| 83 | +This task is normally performed by Core/Infra on a semi-regular basis, |
| 84 | +usually after each new minor release, to collapse the transport versions |
| 85 | +for the previous minor release. An example of such an operation can be found |
| 86 | +[here](https://github.com/elastic/elasticsearch/pull/104937). |
| 87 | + |
| 88 | +### Minimum compatibility versions |
| 89 | + |
| 90 | +The transport version used between two nodes is determined by the initial handshake |
| 91 | +(see `TransportHandshaker`, where the two nodes swap their highest known transport version). |
| 92 | +The lowest transport version that is compatible with the current node |
| 93 | +is determined by `TransportVersions.MINIMUM_COMPATIBLE`, |
| 94 | +and the node is prevented from joining the cluster if it is below that version. |
| 95 | +This constant should be updated manually on a major release. |
| 96 | + |
| 97 | +The minimum version that can be used for CCS is determined by |
| 98 | +`TransportVersions.MINIMUM_CCS_VERSION`, but this is not actively checked |
| 99 | +before queries are performed. Only if a query cannot be serialized at that |
| 100 | +version is an action rejected. This constant is updated automatically |
| 101 | +as part of performing a release. |
| 102 | + |
| 103 | +### Mapping to release versions |
| 104 | + |
| 105 | +For releases that do use a version number, it can be confusing to encounter |
| 106 | +a log or exception message that references an arbitrary transport version, |
| 107 | +where you don't know which release version that corresponds to. This is where |
| 108 | +the `.toReleaseVersion()` method comes in. It uses metadata stored in a csv file |
| 109 | +(`TransportVersions.csv`) to map from the transport version id to the corresponding |
| 110 | +release version. For any transport versions it encounters without a direct map, |
| 111 | +it performs a best guess based on the information it has. The csv file |
| 112 | +is updated automatically as part of performing a release. |
| 113 | + |
| 114 | +In releases that do not have a release version number, that method becomes |
| 115 | +a no-op. |
| 116 | + |
| 117 | +### Managing patches and backports |
| 118 | + |
| 119 | +Backporting transport version changes to previous releases |
| 120 | +should only be done if absolutely necessary, as it is very easy to get wrong |
| 121 | +and break the release in a way that is very hard to recover from. |
| 122 | + |
| 123 | +If we consider the version number as an incrementing line, what we are doing is |
| 124 | +grafting a change that takes effect at a certain point in the line, |
| 125 | +to additionally take effect in a fixed window earlier in the line. |
| 126 | + |
| 127 | +To take an example, using indicative version numbers, when the latest |
| 128 | +transport version is 52, we decide we need to backport a change done in |
| 129 | +transport version 50 to transport version 45. We use the `P` version id component |
| 130 | +to create version 45.1 with the backported change. |
| 131 | +This change will apply for version ids 45.1 to 45.9 (should they exist in the future). |
| 132 | + |
| 133 | +The serialization code in the backport needs to use the backported protocol |
| 134 | +for all version numbers 45.1 to 45.9. The `TransportVersion.isPatchFrom` method |
| 135 | +can be used to easily determine if this is the case: `streamVersion.isPatchFrom(45.1)`. |
| 136 | +However, the `onOrAfter` also does what is needed on patch branches. |
| 137 | + |
| 138 | +The serialization code in version 53 then needs to additionally check |
| 139 | +version numbers 45.1-45.9 to use the backported protocol, also using the `isPatchFrom` method. |
| 140 | + |
| 141 | +As an example, [this transport change](https://github.com/elastic/elasticsearch/pull/107862) |
| 142 | +was backported from 8.15 to [8.14.0](https://github.com/elastic/elasticsearch/pull/108251) |
| 143 | +and [8.13.4](https://github.com/elastic/elasticsearch/pull/108250) at the same time |
| 144 | +(8.14 was a build candidate at the time). |
| 145 | + |
| 146 | +The 8.13 PR has: |
| 147 | + |
| 148 | + if (transportVersion.onOrAfter(8.13_backport_id)) |
| 149 | + |
| 150 | +The 8.14 PR has: |
| 151 | + |
| 152 | + if (transportVersion.isPatchFrom(8.13_backport_id) |
| 153 | + || transportVersion.onOrAfter(8.14_backport_id)) |
| 154 | + |
| 155 | +The 8.15 PR has: |
| 156 | + |
| 157 | + if (transportVersion.isPatchFrom(8.13_backport_id) |
| 158 | + || transportVersion.isPatchFrom(8.14_backport_id) |
| 159 | + || transportVersion.onOrAfter(8.15_transport_id)) |
| 160 | + |
| 161 | +In particular, if you are backporting a change to a patch release, |
| 162 | +you also need to make sure that any subsequent released version on any branch |
| 163 | +also has that change, and knows about the patch backport ids and what they mean. |
| 164 | + |
| 165 | +## Index version |
| 166 | + |
| 167 | +Index version is a single incrementing version number for the index data format, |
| 168 | +metadata, and associated mappings. It is declared the same way as the |
| 169 | +transport version - with the pattern `M_NNN_SS_P`, for the major version, version id, |
| 170 | +subsidiary version id, and patch number respectively. |
| 171 | + |
| 172 | +Index version is stored in index metadata when an index is created, |
| 173 | +and it is used to determine the storage format and what functionality that index supports. |
| 174 | +The index version does not change once an index is created. |
| 175 | + |
| 176 | +In the same way as transport versions, when a change is needed to the index |
| 177 | +data format or metadata, or new mapping types are added, create a new version constant |
| 178 | +below the last one, incrementing the `NNN` version component. |
| 179 | + |
| 180 | +Unlike transport version, version constants cannot be collapsed together, |
| 181 | +as an index keeps its creation version id once it is created. |
| 182 | +Fortunately, new index versions are only created once a month or so, |
| 183 | +so we don’t have a large list of index versions that need managing. |
| 184 | + |
| 185 | +Similar to transport version, index version has a `toReleaseVersion` to map |
| 186 | +onto release versions, in appropriate situations. |
| 187 | + |
| 188 | +## Cluster Features |
| 189 | + |
| 190 | +Cluster features are identifiers, published by a node in cluster state, |
| 191 | +indicating they support a particular top-level operation or set of functionality. |
| 192 | +They are used for internal checks within Elasticsearch, and for gating tests |
| 193 | +on certain functionality. For example, to check all nodes have upgraded |
| 194 | +to a certain point before running a large migration operation to a new data format. |
| 195 | +Cluster features should not be referenced by anything outside the Elasticsearch codebase. |
| 196 | + |
| 197 | +Cluster features are indicative of top-level functionality introduced to |
| 198 | +Elasticsearch - e.g. a new transport endpoint, or new operations. |
| 199 | + |
| 200 | +It is also used to check nodes can join a cluster - once all nodes in a cluster |
| 201 | +support a particular feature, no nodes can then join the cluster that do not |
| 202 | +support that feature. This is to ensure that once a feature is supported |
| 203 | +by a cluster, it will then always be supported in the future. |
| 204 | + |
| 205 | +To declare a new cluster feature, add an implementation of the `FeatureSpecification` SPI, |
| 206 | +suitably registered (or use an existing one for your code area), and add the feature |
| 207 | +as a constant to be returned by getFeatures. To then check whether all nodes |
| 208 | +in the cluster support that feature, use the method `clusterHasFeature` on `FeatureService`. |
| 209 | +It is only possible to check whether all nodes in the cluster have a feature; |
| 210 | +individual node checks should not be done. |
| 211 | + |
| 212 | +Once a cluster feature is declared and deployed, it cannot be modified or removed, |
| 213 | +else new nodes will not be able to join existing clusters. |
| 214 | +If functionality represented by a cluster feature needs to be removed, |
| 215 | +a new cluster feature should be added indicating that functionality is no longer |
| 216 | +supported, and the code modified accordingly (bearing in mind additional BwC constraints). |
| 217 | + |
| 218 | +The cluster features infrastructure is only designed to support a few hundred features |
| 219 | +per major release, and once features are added to a cluster they can not be removed. |
| 220 | +Cluster features should therefore be used sparingly. |
| 221 | +Adding too many cluster features risks increasing cluster instability. |
| 222 | + |
| 223 | +When we release a new major version N, we limit our backwards compatibility |
| 224 | +to the highest minor of the previous major N-1. Therefore, any cluster formed |
| 225 | +with the new major version is guaranteed to have all features introduced during |
| 226 | +releases of major N-1. All such features can be deemed to be met by the cluster, |
| 227 | +and the features themselves can be removed from cluster state over time, |
| 228 | +and the feature checks removed from the code of major version N. |
| 229 | + |
| 230 | +### Testing |
| 231 | + |
| 232 | +Tests often want to check if a certain feature is implemented / available on all nodes, |
| 233 | +particularly BwC or mixed cluster test. |
| 234 | + |
| 235 | +Rather than introducing a production feature just for a test condition, |
| 236 | +this can be done by adding a _test feature_ in an implementation of |
| 237 | +`FeatureSpecification.getTestFeatures`. These features will only be set |
| 238 | +on clusters running as part of an integration test. Even so, cluster features |
| 239 | +should be used sparingly if possible; Capabilities is generally a better |
| 240 | +option for test conditions. |
| 241 | + |
| 242 | +In Java Rest tests, checking cluster features can be done using |
| 243 | +`ESRestTestCase.clusterHasFeature(feature)` |
| 244 | + |
| 245 | +In YAML Rest tests, conditions can be defined in the `requires` or `skip` sections |
| 246 | +that use cluster features; see [here](https://github.com/elastic/elasticsearch/blob/main/rest-api-spec/src/yamlRestTest/resources/rest-api-spec/test/README.asciidoc#skipping-tests) for more information. |
| 247 | + |
| 248 | +To aid with backwards compatibility tests, the test framework adds synthetic features |
| 249 | +for each previously released Elasticsearch version, of the form `gte_v{VERSION}` |
| 250 | +(for example `gte_v8.14.2`). |
| 251 | +This can be used to add conditions based on previous releases. It _cannot_ be used |
| 252 | +to check the current snapshot version; real features or capabilities should be |
| 253 | +used instead. |
| 254 | + |
| 255 | +## Capabilities |
| 256 | + |
| 257 | +The Capabilities API is a REST API for external clients to check the capabilities |
| 258 | +of an Elasticsearch cluster. As it is dynamically calculated for every query, |
| 259 | +it is not limited in size or usage. |
| 260 | + |
| 261 | +A capabilities query can be used to query for 3 things: |
| 262 | +* Is this endpoint supported for this HTTP method? |
| 263 | +* Are these parameters of this endpoint supported? |
| 264 | +* Are these capabilities (arbitrary string ids) of this endpoint supported? |
| 265 | + |
| 266 | +The API will return with a simple true/false, indicating if all specified aspects |
| 267 | +of the endpoint are supported by all nodes in the cluster. |
| 268 | +If any aspect is not supported by any one node, the API returns `false`. |
| 269 | + |
| 270 | +The API can also return `supported: null` (indicating unknown) |
| 271 | +if there was a problem communicating with one or more nodes in the cluster. |
| 272 | + |
| 273 | +All registered endpoints automatically work with the endpoint existence check. |
| 274 | +To add support for parameter and feature capability queries to your REST endpoint, |
| 275 | +implement the `supportedQueryParameters` and `supportedCapabilities` methods in your rest handler. |
| 276 | + |
| 277 | +To perform a capability query, perform a REST call to the `_capabilities` API, |
| 278 | +with parameters `method`, `path`, `parameters`, `capabilities`. |
| 279 | +The call will query every node in the cluster, and return `{supported: true}` |
| 280 | +if all nodes support that specific combination of method, path, query parameters, |
| 281 | +and endpoint capabilities. If any single aspect is not supported, |
| 282 | +the query will return `{supported: false}`. If there are any problems |
| 283 | +communicating with nodes in the cluster, the response will be `{supported: null}` |
| 284 | +indicating support or lack thereof cannot currently be determined. |
| 285 | +Capabilities can be checked using the clusterHasCapability method in ESRestTestCase. |
| 286 | + |
| 287 | +Similar to cluster features, YAML tests can have skip and requires conditions |
| 288 | +specified with capabilities like the following: |
| 289 | + |
| 290 | + - requires: |
| 291 | + capabilities: |
| 292 | + - method: GET |
| 293 | + path: /_endpoint |
| 294 | + parameters: [param1, param2] |
| 295 | + capabilities: [cap1, cap2] |
| 296 | + |
| 297 | +method: GET is the default, and does not need to be explicitly specified. |
0 commit comments