Skip to content
This repository was archived by the owner on Apr 8, 2024. It is now read-only.

Commit b56ed9b

Browse files
marreguinomicode
authored andcommitted
Add howto guide to downscale a cluster that runs on a single host
The idea is to give readers a low entry point to clustering, from which they can expand their knowledge. A Vanilla cluster is a three node cluster that runs on a single host. In this guide we create the cluster, add data to it, and then remove two nodes. Some scripts are required, which are available under the 'scripts' folder. The idea is that users download a zip containing them, or checkout the repo to access them.
1 parent 116e15d commit b56ed9b

16 files changed

+752
-0
lines changed

docs/clustering/downscaling.rst

+282
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,282 @@
1+
.. _scaling-down:
2+
3+
===========
4+
Downscaling
5+
===========
6+
7+
In this howto guide we:
8+
9+
- Create a ``vanilla cluster``.
10+
- Add some data to it.
11+
- Downscale it to a single node cluster.
12+
13+
.. _scaling-down-starting-vanilla-cluster:
14+
15+
Definition of a vanilla cluster
16+
===============================
17+
18+
``vanilla cluster`` is a term that will be used in this document to refer to a
19+
three node CrateDB cluster that runs on a single host. That is, one computer runs
20+
the three nodes that make up the vanilla cluster, and therefore each of the nodes
21+
shares the file system and operating system's scheduler with the rest of the nodes.
22+
23+
This configuration provides parallel processing power on large scale data, when you
24+
only have one host, and this comes at the cost increased latency in writes, because
25+
when operating as a cluster, the nodes must reach consensus on each write operation
26+
(insert/update).
27+
28+
Necessary scripts
29+
=================
30+
31+
You can access the necessary scripts and configuration files by cloning the crate-howtos_
32+
git repo and access them under *crate-howtos/scripts/downscaling*.
33+
34+
Starting a vanilla cluster
35+
==========================
36+
37+
Proceed:
38+
39+
1. Explore *crate-howtos/scripts/downscaling*, which should contain:
40+
41+
- *update-dist*: script to install **CrateDB**.
42+
- *dist*: installed **CrateDB** distribution (will be created in step 2).
43+
- *crate*: a symlink to the installed distribution in the *dist* folder, where
44+
you will also find a *crate-clone* git repository.
45+
- *conf*: **CrateDB** configurations, each node in the cluster has a folder
46+
in there, with the *crate.yml* and *log4j2.properties* configuration files.
47+
- *data*: **CrateDB** the nodes will persist their data under *data/n<i>/nodes/0*.
48+
- *repo*: **CrateDB** repository for keeping snapshots.
49+
- *start-node*: script to start **CrateDB** with a given configuration specified
50+
as a node name, e.g. n1, in the parameters to the script.
51+
- *detach-node*: script to detach a node from the vanilla cluster.
52+
- *bootstrap-node*: script to bootstrap a node to form a new cluster. Which
53+
means, recreating its cluster state so that it may be started on its own.
54+
- *data.py*: python3 script produce sample data.
55+
56+
2. Run *./update-dist*
57+
58+
- This script will install the latest, unreleased, **CrateDB** under *dist/*,
59+
creating a link *./crate -> dist/crate..*.
60+
- Assuming **git**, **java 11** or later, **python3** and a **terminal** are
61+
available to you, and you have an account in GitHub_.
62+
63+
3. The configuration for the vanilla cluster:
64+
65+
- *crate-howto/scripts/downscaling/conf/n1/crate.yml*
66+
- *crate-howto/scripts/downscaling/conf/n2/crate.yml*
67+
- *crate-howto/scripts/downscaling/conf/n3/crate.yml*
68+
69+
We will show here the configuration of n1, which is the exact same as
70+
for n2 and n3, with the exception of the node name (it could be removed
71+
altogether from the configuration and be handed in ``-Cnode.name=n1``
72+
through the ``start-node`` script):
73+
74+
::
75+
76+
cluster.name: vanilla
77+
node.name: n1
78+
network.host: _local_
79+
node.max_local_storage_nodes: 1
80+
stats.service.interval: 0
81+
82+
http.cors.enabled: true
83+
http.cors.allow-origin: "*"
84+
85+
transport.tcp.port: 4301
86+
gateway.expected_nodes: 3
87+
gateway.recover_after_nodes: 2
88+
discovery.seed_hosts:
89+
- 127.0.0.1:4301
90+
- 127.0.0.1:4302
91+
cluster.initial_master_nodes:
92+
- 127.0.0.1:4301
93+
- 127.0.0.1:4302
94+
95+
These settings come explained in cluster-wide-settings_ and node-specific-settings_.
96+
97+
4. Run *./start-node* in three different terminals, one for each node:
98+
99+
- *./start-node n1*
100+
- *./start-node n2*
101+
- *./start-node n3*
102+
103+
Which will form the ``vanilla cluster``, electing a master. You can
104+
interact with the ``vanilla cluster`` by opening a browser and pointing
105+
it to *http://localhost:4200*, *CrateDB*'s `Admin UI`_.
106+
107+
108+
.. _scaling-down-adding-data:
109+
110+
Adding some data to the vanilla cluster
111+
=======================================
112+
113+
Proceed:
114+
115+
1. Produce a CSV_ file containing 3600 rows of log data (1 hour's worth of logs @1Hz):
116+
117+
::
118+
119+
python3 data.py > logs.csv
120+
121+
2. In the `Admin UI`_:
122+
123+
::
124+
125+
CREATE TABLE logs (log_time timestamp NOT NULL,
126+
client_ip ip NOT NULL,
127+
request string NOT NULL,
128+
status_code short NOT NULL,
129+
object_size long NOT NULL);
130+
131+
COPY logs FROM 'file:/// /crate-howtos/scripts/downscaling/logs.csv';
132+
REFRESH TABLE logs;
133+
select * from logs order by log_time limit 10800;
134+
135+
The three nodes perform the copy operation (remember, we are operating as a cluster),
136+
so we are expecting to see 3600 * 3 rows inserted, what looks like "repeated" data.
137+
Because we did not define a primary key, **CrateDB** created the default *_id* primary
138+
key for each row, and this was done at each node. The result is that each node inserted
139+
a row per line in the csv file, with a cluster wide unique default *_id*, and we
140+
perceive this as a triplication of the data. If you do not want to see triplication,
141+
define a primary key.
142+
143+
.. _scaling-down-exploring-the-data:
144+
145+
Exploring the Data
146+
==================
147+
148+
Using the `Admin UI`_, shards view on the left:
149+
150+
.. image:: shards-view.png
151+
152+
We can see the three nodes, with each having a number of shards, like so:
153+
154+
+-------+---+---+---+---+---+---+
155+
| Shard | 0 | 1 | 2 | 3 | 4 | 5 |
156+
+=======+===+===+===+===+===+===+
157+
| n1 | . | . | . | | . | |
158+
+-------+---+---+---+---+---+---+
159+
| n2 | . | . | | . | | . |
160+
+-------+---+---+---+---+---+---+
161+
| n3 | | | . | . | . | . |
162+
+-------+---+---+---+---+---+---+
163+
164+
Thus in this cluster setup, one node can crash; yet the data in the cluster
165+
will still remain fully available because any two nodes have access to all
166+
the shards when they work together to fulfill query requests. A SQL table
167+
is a composite of shards (six in our case). When a query is executed, the
168+
planner will define steps for accessing all the shards of the table.
169+
By adding nodes to the cluster, the data is spread over more nodes, so that
170+
the computing is parallelized.
171+
172+
Having a look at the setup for table *logs*:
173+
174+
::
175+
176+
SHOW CREATE TABLE logs;
177+
178+
Will return:
179+
180+
::
181+
182+
CREATE TABLE IF NOT EXISTS "doc"."logs" (
183+
"log_time" TIMESTAMP WITH TIME ZONE NOT NULL,
184+
"client_ip" IP NOT NULL,
185+
"request" TEXT NOT NULL,
186+
"status_code" SMALLINT NOT NULL,
187+
"object_size" BIGINT NOT NULL
188+
)
189+
CLUSTERED INTO 6 SHARDS
190+
WITH (
191+
192+
number_of_replicas = '0-1',
193+
194+
)
195+
196+
We have a default min number of replicas of zero, and a max of one for each
197+
of our six shards. A replica is simply a copy of a shard.
198+
199+
200+
.. _scaling-down-downscaling:
201+
202+
Downscaling (by means of replicas)
203+
==================================
204+
205+
Downscaling by means of replicas is achieved by making sure the surviving nodes
206+
of the cluster have access to all the shards, even when the other nodes are missing.
207+
208+
1. We need to ensure that the number of replicas matches the number of nodes:
209+
210+
::
211+
212+
ALTER TABLE logs SET (number_of_replicas = '1-all');
213+
214+
In the `Admin UI`_, we can follow the progress of replication.
215+
216+
2. After replication is completed, we can take down all the nodes in the cluster
217+
(*ctrl^C* in the terminal).
218+
219+
3. Run *./detach-node ni*, where i in [2,3], to detach **n2** and **n3** from the cluster.
220+
We will let **n1** form a new cluster all by itself, with access to the original data.
221+
The command succeeds but delivers an exception that you can ignore safely:
222+
223+
::
224+
225+
Node was successfully detached from the cluster
226+
Exception in thread "Thread-0" java.lang.NoClassDefFoundError: org/elasticsearch/core/internal/io/IOUtils
227+
at org.elasticsearch.cli.MultiCommand.close(MultiCommand.java:82)
228+
at org.elasticsearch.cli.Command.lambda$main$0(Command.java:70)
229+
at java.base/java.lang.Thread.run(Thread.java:832)
230+
Caused by: java.lang.ClassNotFoundException: org.elasticsearch.core.internal.io.IOUtils
231+
at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:602)
232+
at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:178)
233+
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:522)
234+
235+
4. Change **n1**'s configuration *crate.yml*. The best practice is to select the node
236+
that was master, as then we know it had the latest version of the cluster state. For
237+
our tutorial, we are running in a single host so the cluster state is more or less
238+
guaranteed to be consistent across all nodes. In principle, however, the cluster could
239+
be running across multiple hosts, and then we would want the master node to become the
240+
new single node cluster:
241+
242+
::
243+
244+
cluster.name: simple # don't need to change this
245+
node.name: n1
246+
stats.service.interval: 0
247+
network.host: _local_
248+
node.max_local_storage_nodes: 1
249+
250+
http.cors.enabled: true
251+
http.cors.allow-origin: "*"
252+
253+
transport.tcp.port: 4301
254+
#gateway.expected_nodes: 3
255+
#gateway.recover_after_nodes: 2
256+
#discovery.seed_hosts:
257+
# - 127.0.0.1:4301
258+
# - 127.0.0.1:4302
259+
#cluster.initial_master_nodes:
260+
# - 127.0.0.1:4301
261+
# - 127.0.0.1:4302
262+
263+
5. Run *./bootstrap-node n1* to let **n1** join a new cluster when it starts.
264+
265+
6. Run *./start-node n1*.
266+
Panic not, the cluster state is *[YELLOW]*, we sort that out with:
267+
268+
::
269+
270+
ALTER TABLE logs SET (number_of_replicas = '0-1');
271+
272+
Further reading: crate-node-tool_.
273+
274+
275+
.. _crate-howtos: https://github.com/crate/crate-howtos
276+
.. _GitHub: https://github.com/crate/crate.git
277+
.. _cluster-wide-settings: https://crate.io/docs/crate/reference/en/latest/config/cluster.html
278+
.. _node-specific-settings: https://crate.io/docs/crate/reference/en/latest/config/node.html
279+
.. _`Admin UI`: http://localhost:4200
280+
.. _crate-node: https://crate.io/docs/crate/reference/en/latest/cli-tools.html#cli-crate-node
281+
.. _CSV: https://en.wikipedia.org/wiki/Comma-separated_values
282+
.. _crate-node-tool: https://crate.io/docs/crate/guide/en/latest/best-practices/crate-node.html

0 commit comments

Comments
 (0)