Skip to content

Commit 2cfe3f3

Browse files
leemthompokosabogi
andauthored
Updates 'Getting ready for production' page (#113679) (#113876)
* Updates 'Getting ready for production' page * Update docs/reference/intro.asciidoc Co-authored-by: shainaraskas <[email protected]> * Update docs/reference/intro.asciidoc Co-authored-by: shainaraskas <[email protected]> * Update docs/reference/intro.asciidoc Co-authored-by: shainaraskas <[email protected]> * Update docs/reference/intro.asciidoc Co-authored-by: shainaraskas <[email protected]> * Update docs/reference/intro.asciidoc Co-authored-by: Liam Thompson <[email protected]> * Update docs/reference/intro.asciidoc Co-authored-by: Liam Thompson <[email protected]> * Update docs/reference/intro.asciidoc Co-authored-by: shainaraskas <[email protected]> * Update docs/reference/intro.asciidoc Co-authored-by: Liam Thompson <[email protected]> --------- Co-authored-by: shainaraskas <[email protected]> Co-authored-by: Liam Thompson <[email protected]> (cherry picked from commit 9568d9c) Co-authored-by: kosabogi <[email protected]>
1 parent 6902296 commit 2cfe3f3

File tree

1 file changed

+99
-72
lines changed

1 file changed

+99
-72
lines changed

docs/reference/intro.asciidoc

Lines changed: 99 additions & 72 deletions
Original file line numberDiff line numberDiff line change
@@ -370,99 +370,126 @@ Does not yet support full-text search.
370370
| <<sql-apis,`_sql`>>
371371

372372
| {kibana-ref}/kuery-query.html[Kibana Query Language (KQL)]
373-
| Kibana Query Language (KQL) is a text-based query language for filtering data when you access it through the {kib} UI.
373+
| {kib} Query Language (KQL) is a text-based query language for filtering data when you access it through the {kib} UI.
374374
| Use KQL to filter documents where a value for a field exists, matches a given value, or is within a given range.
375375
| N/A
376376

377377
|===
378378

379379
// New html page
380-
// TODO: this page won't live here long term
381380
[[scalability]]
382-
=== Plan for production
383-
384-
{es} is built to be always available and to scale with your needs. It does this
385-
by being distributed by nature. You can add servers (nodes) to a cluster to
386-
increase capacity and {es} automatically distributes your data and query load
387-
across all of the available nodes. No need to overhaul your application, {es}
388-
knows how to balance multi-node clusters to provide scale and high availability.
389-
The more nodes, the merrier.
390-
391-
How does this work? Under the covers, an {es} index is really just a logical
392-
grouping of one or more physical shards, where each shard is actually a
393-
self-contained index. By distributing the documents in an index across multiple
394-
shards, and distributing those shards across multiple nodes, {es} can ensure
395-
redundancy, which both protects against hardware failures and increases
396-
query capacity as nodes are added to a cluster. As the cluster grows (or shrinks),
397-
{es} automatically migrates shards to rebalance the cluster.
398-
399-
There are two types of shards: primaries and replicas. Each document in an index
400-
belongs to one primary shard. A replica shard is a copy of a primary shard.
401-
Replicas provide redundant copies of your data to protect against hardware
402-
failure and increase capacity to serve read requests
403-
like searching or retrieving a document.
404-
405-
The number of primary shards in an index is fixed at the time that an index is
406-
created, but the number of replica shards can be changed at any time, without
407-
interrupting indexing or query operations.
381+
=== Get ready for production
382+
383+
Many teams rely on {es} to run their key services. To keep these services running, you can design your {es} deployment
384+
to keep {es} available, even in case of large-scale outages. To keep it running fast, you also can design your
385+
deployment to be responsive to production workloads.
386+
387+
{es} is built to be always available and to scale with your needs. It does this using a distributed architecture.
388+
By distributing your cluster, you can keep Elastic online and responsive to requests.
389+
390+
In case of failure, {es} offers tools for cross-cluster replication and cluster snapshots that can
391+
help you fall back or recover quickly. You can also use cross-cluster replication to serve requests based on the
392+
geographic location of your users and your resources.
393+
394+
{es} also offers security and monitoring tools to help you keep your cluster highly available.
395+
396+
[discrete]
397+
[[use-multiple-nodes-shards]]
398+
==== Use multiple nodes and shards
399+
400+
[NOTE]
401+
====
402+
Nodes and shards are what make {es} distributed and scalable.
403+
404+
These concepts aren’t essential if you’re just getting started. How you <<elasticsearch-intro-deploy,deploy {es}>> in production determines what you need to know:
405+
406+
* *Self-managed {es}*: You are responsible for setting up and managing nodes, clusters, shards, and replicas. This includes
407+
managing the underlying infrastructure, scaling, and ensuring high availability through failover and backup strategies.
408+
* *Elastic Cloud*: Elastic can autoscale resources in response to workload changes. Choose from different deployment types
409+
to apply sensible defaults for your use case. A basic understanding of nodes, shards, and replicas is still important.
410+
* *Elastic Cloud Serverless*: You don’t need to worry about nodes, shards, or replicas. These resources are 100% automated
411+
on the serverless platform, which is designed to scale with your workload.
412+
====
413+
414+
You can add servers (_nodes_) to a cluster to increase capacity, and {es} automatically distributes your data and query load
415+
across all of the available nodes.
416+
417+
Elastic is able to distribute your data across nodes by subdividing an index into _shards_. Each index in {es} is a grouping
418+
of one or more physical shards, where each shard is a self-contained Lucene index containing a subset of the documents in
419+
the index. By distributing the documents in an index across multiple shards, and distributing those shards across multiple
420+
nodes, {es} increases indexing and query capacity.
421+
422+
There are two types of shards: _primaries_ and _replicas_. Each document in an index belongs to one primary shard. A replica
423+
shard is a copy of a primary shard. Replicas maintain redundant copies of your data across the nodes in your cluster.
424+
This protects against hardware failure and increases capacity to serve read requests like searching or retrieving a document.
425+
426+
[TIP]
427+
====
428+
The number of primary shards in an index is fixed at the time that an index is created, but the number of replica shards can
429+
be changed at any time, without interrupting indexing or query operations.
430+
====
431+
432+
Shard copies in your cluster are automatically balanced across nodes to provide scale and high availability. All nodes are
433+
aware of all the other nodes in the cluster and can forward client requests to the appropriate node. This allows {es}
434+
to distribute indexing and query load across the cluster.
435+
436+
If you’re exploring {es} for the first time or working in a development environment, then you can use a cluster with a single node and create indices
437+
with only one shard. However, in a production environment, you should build a cluster with multiple nodes and indices
438+
with multiple shards to increase performance and resilience.
439+
440+
// TODO - diagram
441+
442+
To learn about optimizing the number and size of shards in your cluster, refer to <<size-your-shards,Size your shards>>.
443+
To learn about how read and write operations are replicated across shards and shard copies, refer to <<docs-replication,Reading and writing documents>>.
444+
To adjust how shards are allocated and balanced across nodes, refer to <<shard-allocation-relocation-recovery,Shard allocation, relocation, and recovery>>.
408445

409446
[discrete]
410-
[[it-depends]]
411-
==== Shard size and number of shards
447+
[[ccr-disaster-recovery-geo-proximity]]
448+
==== CCR for disaster recovery and geo-proximity
412449

413-
There are a number of performance considerations and trade offs with respect
414-
to shard size and the number of primary shards configured for an index. The more
415-
shards, the more overhead there is simply in maintaining those indices. The
416-
larger the shard size, the longer it takes to move shards around when {es}
417-
needs to rebalance a cluster.
450+
To effectively distribute read and write operations across nodes, the nodes in a cluster need good, reliable connections
451+
to each other. To provide better connections, you typically co-locate the nodes in the same data center or nearby data centers.
418452

419-
Querying lots of small shards makes the processing per shard faster, but more
420-
queries means more overhead, so querying a smaller
421-
number of larger shards might be faster. In short...it depends.
453+
Co-locating nodes in a single location exposes you to the risk of a single outage taking your entire cluster offline. To
454+
maintain high availability, you can prepare a second cluster that can take over in case of disaster by implementing
455+
cross-cluster replication (CCR).
422456

423-
As a starting point:
457+
CCR provides a way to automatically synchronize indices from your primary cluster to a secondary remote cluster that
458+
can serve as a hot backup. If the primary cluster fails, the secondary cluster can take over.
424459

425-
* Aim to keep the average shard size between a few GB and a few tens of GB. For
426-
use cases with time-based data, it is common to see shards in the 20GB to 40GB
427-
range.
460+
You can also use CCR to create secondary clusters to serve read requests in geo-proximity to your users.
428461

429-
* Avoid the gazillion shards problem. The number of shards a node can hold is
430-
proportional to the available heap space. As a general rule, the number of
431-
shards per GB of heap space should be less than 20.
462+
Learn more about <<xpack-ccr,cross-cluster replication>> and about <<high-availability-cluster-design,designing for resilience>>.
432463

433-
The best way to determine the optimal configuration for your use case is
434-
through https://www.elastic.co/elasticon/conf/2016/sf/quantitative-cluster-sizing[
435-
testing with your own data and queries].
464+
[TIP]
465+
====
466+
You can also take <<snapshot-restore,snapshots>> of your cluster that can be restored in case of failure.
467+
====
436468

437469
[discrete]
438-
[[disaster-ccr]]
439-
==== Disaster recovery
470+
[[security-and-monitoring]]
471+
==== Security and monitoring
440472

441-
A cluster's nodes need good, reliable connections to each other. To provide
442-
better connections, you typically co-locate the nodes in the same data center or
443-
nearby data centers. However, to maintain high availability, you
444-
also need to avoid any single point of failure. In the event of a major outage
445-
in one location, servers in another location need to be able to take over. The
446-
answer? {ccr-cap} (CCR).
473+
As with any enterprise system, you need tools to secure, manage, and monitor your {es} clusters. Security,
474+
monitoring, and administrative features that are integrated into {es} enable you to use {kibana-ref}/introduction.html[Kibana] as a
475+
control center for managing a cluster.
447476

448-
CCR provides a way to automatically synchronize indices from your primary cluster
449-
to a secondary remote cluster that can serve as a hot backup. If the primary
450-
cluster fails, the secondary cluster can take over. You can also use CCR to
451-
create secondary clusters to serve read requests in geo-proximity to your users.
477+
<<secure-cluster,Learn about securing an {es} cluster>>.
452478

453-
{ccr-cap} is active-passive. The index on the primary cluster is
454-
the active leader index and handles all write requests. Indices replicated to
455-
secondary clusters are read-only followers.
479+
<<monitor-elasticsearch-cluster,Learn about monitoring your cluster>>.
456480

457481
[discrete]
458-
[[admin]]
459-
==== Security, management, and monitoring
482+
[[cluster-design]]
483+
==== Cluster design
484+
485+
{es} offers many options that allow you to configure your cluster to meet your organization’s goals, requirements,
486+
and restrictions. You can review the following guides to learn how to tune your cluster to meet your needs:
460487

461-
As with any enterprise system, you need tools to secure, manage, and
462-
monitor your {es} clusters. Security, monitoring, and administrative features
463-
that are integrated into {es} enable you to use {kibana-ref}/introduction.html[{kib}]
464-
as a control center for managing a cluster. Features like <<downsampling,
465-
downsampling>> and <<index-lifecycle-management, index lifecycle management>>
466-
help you intelligently manage your data over time.
488+
* <<high-availability-cluster-design,Designing for resilience>>
489+
* <<tune-for-indexing-speed,Tune for indexing speed>>
490+
* <<tune-for-search-speed,Tune for search speed>>
491+
* <<tune-for-disk-usage,Tune for disk usage>>
492+
* <<use-elasticsearch-for-time-series-data,Tune for time series data>>
467493

468-
Refer to <<monitor-elasticsearch-cluster>> for more information.
494+
Many {es} options come with different performance considerations and trade-offs. The best way to determine the
495+
optimal configuration for your use case is through https://www.elastic.co/elasticon/conf/2016/sf/quantitative-cluster-sizing[testing with your own data and queries].

0 commit comments

Comments
 (0)