crate
diff --git a/‎docs/clustering/downscaling.rst
+282 b/‎docs/clustering/downscaling.rst
+282
@@ -0,0 +1,282 @@
+.. _scaling-down:
+
+===========
+Downscaling
+===========
+
+In this howto guide we:
+
+- Create a ``vanilla cluster``.
+- Add some data to it.
+- Downscale it to a single node cluster.
+
+.. _scaling-down-starting-vanilla-cluster:
+
+Definition of a vanilla cluster
+===============================
+
+``vanilla cluster`` is a term that will be used in this document to refer to a
+three node CrateDB cluster that runs on a single host. That is, one computer runs
+the three nodes that make up the vanilla cluster, and therefore each of the nodes
+shares the file system and operating system's scheduler with the rest of the nodes.
+
+This configuration provides parallel processing power on large scale data, when you
+only have one host, and this comes at the cost increased latency in writes, because
+when operating as a cluster, the nodes must reach consensus on each write operation
+(insert/update).
+
+Necessary scripts
+=================
+
+You can access the necessary scripts and configuration files by cloning the crate-howtos_
+git repo and access them under *crate-howtos/scripts/downscaling*.
+
+Starting a vanilla cluster
+==========================
+
+Proceed:
+
+1. Explore *crate-howtos/scripts/downscaling*, which should contain:
+
+   - *update-dist*: script to install **CrateDB**.
+   - *dist*: installed **CrateDB** distribution (will be created in step 2).
+   - *crate*: a symlink to the installed distribution in the *dist* folder, where
+     you will also find a *crate-clone* git repository.
+   - *conf*: **CrateDB** configurations, each node in the cluster has a folder
+     in there, with the *crate.yml* and *log4j2.properties* configuration files.
+   - *data*: **CrateDB** the nodes will persist their data under *data/n<i>/nodes/0*.
+   - *repo*: **CrateDB** repository for keeping snapshots.
+   - *start-node*: script to start **CrateDB** with a given configuration specified
+     as a node name, e.g. n1, in the parameters to the script.
+   - *detach-node*: script to detach a node from the vanilla cluster.
+   - *bootstrap-node*: script to bootstrap a node to form a new cluster. Which
+     means, recreating its cluster state so that it may be started on its own.
+   - *data.py*: python3 script produce sample data.
+
+2. Run *./update-dist*
+
+   - This script will install the latest, unreleased, **CrateDB** under *dist/*,
+     creating a link *./crate -> dist/crate..*.
+   - Assuming **git**, **java 11** or later, **python3** and a **terminal** are
+     available to you, and you have an account in GitHub_.
+
+3. The configuration for the vanilla cluster:
+
+    - *crate-howto/scripts/downscaling/conf/n1/crate.yml*
+    - *crate-howto/scripts/downscaling/conf/n2/crate.yml*
+    - *crate-howto/scripts/downscaling/conf/n3/crate.yml*
+
+    We will show here the configuration of n1, which is the exact same as
+    for n2 and n3, with the exception of the node name (it could be removed
+    altogether from the configuration and be handed in ``-Cnode.name=n1``
+    through the ``start-node`` script):
+
+    ::
+
+        cluster.name: vanilla
+        node.name: n1
+        network.host: _local_
+        node.max_local_storage_nodes: 1
+        stats.service.interval: 0
+
+        http.cors.enabled: true
+        http.cors.allow-origin: "*"
+
+        transport.tcp.port: 4301
+        gateway.expected_nodes: 3
+        gateway.recover_after_nodes: 2
+        discovery.seed_hosts:
+          - 127.0.0.1:4301
+          - 127.0.0.1:4302
+        cluster.initial_master_nodes:
+          - 127.0.0.1:4301
+         - 127.0.0.1:4302
+
+    These settings come explained in cluster-wide-settings_ and node-specific-settings_.
+
+4. Run *./start-node* in three different terminals, one for each node:
+
+   - *./start-node n1*
+   - *./start-node n2*
+   - *./start-node n3*
+
+   Which will form the ``vanilla cluster``, electing a master. You can
+   interact with the ``vanilla cluster`` by opening a browser and pointing
+   it to *http://localhost:4200*, *CrateDB*'s `Admin UI`_.
+
+
+.. _scaling-down-adding-data:
+
+Adding some data to the vanilla cluster
+=======================================
+
+Proceed:
+
+1. Produce a CSV_ file containing 3600 rows of log data (1 hour's worth of logs @1Hz):
+
+  ::
+
+    python3 data.py > logs.csv
+
+2. In the `Admin UI`_:
+
+  ::
+
+    CREATE TABLE logs (log_time timestamp NOT NULL,
+                       client_ip ip NOT NULL,
+                       request string NOT NULL,
+                       status_code short NOT NULL,
+                       object_size long NOT NULL);
+
+     COPY logs FROM 'file:///  /crate-howtos/scripts/downscaling/logs.csv';
+     REFRESH TABLE logs;
+     select * from logs order by log_time limit 10800;
+
+  The three nodes perform the copy operation (remember, we are operating as a cluster),
+  so we are expecting to see 3600 * 3 rows inserted, what looks like "repeated" data.
+  Because we did not define a primary key, **CrateDB** created the default *_id* primary
+  key for each row, and this was done at each node. The result is that each node inserted
+  a row per line in the csv file, with a cluster wide unique default *_id*, and we
+  perceive this as a triplication of the data. If you do not want to see triplication,
+  define a primary key.
+
+.. _scaling-down-exploring-the-data:
+
+Exploring the Data
+==================
+
+Using the `Admin UI`_, shards view on the left:
+
+.. image:: shards-view.png
+
+We can see the three nodes, with each having a number of shards, like so:
+
+    +-------+---+---+---+---+---+---+
+    | Shard | 0 | 1 | 2 | 3 | 4 | 5 |
+    +=======+===+===+===+===+===+===+
+    |  n1   | . | . | . |   | . |   |
+    +-------+---+---+---+---+---+---+
+    |  n2   | . | . |   | . |   | . |
+    +-------+---+---+---+---+---+---+
+    |  n3   |   |   | . | . | . | . |
+    +-------+---+---+---+---+---+---+
+
+Thus in this cluster setup, one node can crash; yet the data in the cluster
+will still remain fully available because any two nodes have access to all
+the shards when they work together to fulfill query requests. A SQL table
+is a composite of shards (six in our case). When a query is executed, the
+planner will define steps for accessing all the shards of the table.
+By adding nodes to the cluster, the data is spread over more nodes, so that
+the computing is parallelized.
+
+Having a look at the setup for table *logs*:
+
+::
+
+  SHOW CREATE TABLE logs;
+
+Will return:
+
+::
+
+  CREATE TABLE IF NOT EXISTS "doc"."logs" (
+     "log_time" TIMESTAMP WITH TIME ZONE NOT NULL,
+     "client_ip" IP NOT NULL,
+     "request" TEXT NOT NULL,
+     "status_code" SMALLINT NOT NULL,
+     "object_size" BIGINT NOT NULL
+  )
+  CLUSTERED INTO 6 SHARDS
+  WITH (
+
+     number_of_replicas = '0-1',
+
+  )
+
+We have a default min number of replicas of zero, and a max of one for each
+of our six shards. A replica is simply a copy of a shard.
+
+
+.. _scaling-down-downscaling:
+
+Downscaling (by means of replicas)
+==================================
+
+Downscaling by means of replicas is achieved by making sure the surviving nodes
+of the cluster have access to all the shards, even when the other nodes are missing.
+
+1. We need to ensure that the number of replicas matches the number of nodes:
+
+::
+
+  ALTER TABLE logs SET (number_of_replicas = '1-all');
+
+In the `Admin UI`_, we can follow the progress of replication.
+
+2. After replication is completed, we can take down all the nodes in the cluster
+   (*ctrl^C* in the terminal).
+
+3. Run *./detach-node ni*, where i in [2,3], to detach **n2** and **n3** from the cluster.
+   We will let **n1** form a new cluster all by itself, with access to the original data.
+   The command succeeds but delivers an exception that you can ignore safely:
+
+   ::
+
+       Node was successfully detached from the cluster
+       Exception in thread "Thread-0" java.lang.NoClassDefFoundError: org/elasticsearch/core/internal/io/IOUtils
+	           at org.elasticsearch.cli.MultiCommand.close(MultiCommand.java:82)
+	           at org.elasticsearch.cli.Command.lambda$main$0(Command.java:70)
+	           at java.base/java.lang.Thread.run(Thread.java:832)
+       Caused by: java.lang.ClassNotFoundException: org.elasticsearch.core.internal.io.IOUtils
+	           at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:602)
+	           at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:178)
+	           at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:522)
+
+4. Change **n1**'s configuration *crate.yml*. The best practice is to select the node
+   that was master, as then we know it had the latest version of the cluster state. For
+   our tutorial, we are running in a single host so the cluster state is more or less
+   guaranteed to be consistent across all nodes. In principle, however, the cluster could
+   be running across multiple hosts, and then we would want the master node to become the
+   new single node cluster:
+
+   ::
+
+     cluster.name: simple   # don't need to change this
+     node.name: n1
+     stats.service.interval: 0
+     network.host: _local_
+     node.max_local_storage_nodes: 1
+
+     http.cors.enabled: true
+     http.cors.allow-origin: "*"
+
+     transport.tcp.port: 4301
+     #gateway.expected_nodes: 3
+     #gateway.recover_after_nodes: 2
+     #discovery.seed_hosts:
+     #  - 127.0.0.1:4301
+     #  - 127.0.0.1:4302
+     #cluster.initial_master_nodes:
+     #  - 127.0.0.1:4301
+     #  - 127.0.0.1:4302
+
+5. Run *./bootstrap-node n1* to let **n1** join a new cluster when it starts.
+
+6. Run *./start-node n1*.
+   Panic not, the cluster state is *[YELLOW]*, we sort that out with:
+
+   ::
+
+     ALTER TABLE logs SET (number_of_replicas = '0-1');
+
+Further reading: crate-node-tool_.
+
+
+.. _crate-howtos: https://github.com/crate/crate-howtos
+.. _GitHub: https://github.com/crate/crate.git
+.. _cluster-wide-settings: https://crate.io/docs/crate/reference/en/latest/config/cluster.html
+.. _node-specific-settings: https://crate.io/docs/crate/reference/en/latest/config/node.html
+.. _`Admin UI`: http://localhost:4200
+.. _crate-node: https://crate.io/docs/crate/reference/en/latest/cli-tools.html#cli-crate-node
+.. _CSV: https://en.wikipedia.org/wiki/Comma-separated_values
+.. _crate-node-tool: https://crate.io/docs/crate/guide/en/latest/best-practices/crate-node.html