Add tutorial on how to remove nodes from a cluster #159

marregui · 2020-03-05T20:54:04Z

This tutorial is a roundabout exemplification of the use of the crate-node tool to detach and bootstrap nodes.

First a cluster is setup, by pointing the reader to existing tutorials, and by elaborating on how to build CrateDB from sources and test locally rather than on a production cluster. Then some data is loaded, with references to the ISS tutorial, which is awesome. But also, suggesting alternative ways. Finally the 3 node cluster is made to be a single node cluster.

matthijskrul · 2020-03-16T12:42:31Z

docs/scaling/downscaling.txt

+
+.. _scaling-down-starting-vanilla-cluster:
+
+Starting a Vanilla cluster


Is "Vanilla" a technical term here, or just meant in the sense of "generic"? If the latter I'd write it lowercase "vanilla"

marregui · 2020-06-18T19:04:36Z

Hola @mechanomi ,

I was going to work on this guide, however there is this blocked label. Should I get the guide to merge point, or are the contents not relevant anymore?

Thank you

nomicode · 2020-06-22T12:53:53Z

@marregui the blocked label was to indicate to me that you are blocked on this. but I changed how we're organizing the project board so I removed it

will get back to you on this soon

marregui · 2020-07-02T10:44:51Z

Excellent thank you. I have pushed the latest version, so now it is down to PR reviewing amending.

nomicode · 2020-07-22T09:28:28Z

@marregui heya. I've taken a look! I have some high-level comments

I am not comfortable with the idea of shipping scripts in a directory that the reader has to clone. for a how-to guide, the scripts should be embedded in the text

however, it is fine to link to pastebin versions of them for ease of use. our current way of doing this is to use a pastebin service that is most appropriate to the type of code being used. the critical thing here, though, is that the full code is there in the doc. the pastebin is doesn't have anything extra in it. this allows us to re-create pastebin links if we want to without risking losing any information

also: please don't use GitHub to this, or any other service that ties the paste to a specific username. we want to avoid the situation where an individual author disables their account on a third-party service some time in the future and takes down a bunch of pastes that we are using

so my first request is this: can you rewrite this so that you introduce each script as you go along. explain what it does, how it can be used, etc. fit it into the "story" you're telling in the how-to

for some of the scripts, I think you should consider dropping them entirely. for example, the one that installs CrateDB. a better solution here is to link to our existing installation guides or deployment guides and let the user figure this out for themselves. we should keep the how-to guide as tight and focused as possible

I also don't think it's necessary to include the complete configuration for each node. we should assume a default configuration and only give specific example configuration snippets when the user is required to make alterations to the default config

can you please restructure the how-to so that you don't introduce all the scripts at the beginning. the scripts, or config snippets, or code snippets should appear in the text at the point where they are needed. this helps keep the how-to readable and formats it more like a story (which is easier to read)

for the sample data, I would suggest two alternative methods:

either point the reader at our tutorials showing them how to generate mock time series data using ISS telemetry information
show them how to use the cr8 tool (see https://crate.io/docs/crate/howtos/en/latest/performance/inserts/testing.html)

it's possible that this how-to becomes quite long with the changes I have requested. that's fine. as we continue to review and edit we can make the decision to split it up into multiple parts if that makes sense

marregui · 2020-07-22T13:56:26Z

thank you!, sure, will do as you request.

curiosity, is pastebin this: https://pastebin.com/ ?

nomicode · 2020-07-22T14:05:53Z

@marregui that's one pastebin. but you should look around. there are other options. we should pick the one that is most used by the community (whichever community the doc is targeting)

marregui · 2020-07-23T10:00:57Z

Hi @mechanomi

I have deleted the scripts and inlined them where relevant. I have also revisited the prose to include hooks to other guides/documentation.

nomicode · 2020-07-27T11:27:31Z

@marregui can you let me know explicitly when you'd like me to take another look, please? I'm not sure from the comment you left. thanks!

marregui · 2020-07-27T12:56:24Z

hi @mechanomi could you please have another look? thank you in advance.

marregui · 2020-08-04T10:56:17Z

hi, I will away next week. I was wondering about the chances of merging the PR this week? thank you in advance

nomicode · 2020-08-07T11:12:45Z

@marregui heya. sorry, I've been sick this week. I hope to have a second review ready for you by the time you're back from vacation

marregui · 2020-08-17T08:19:00Z

Hi @mechanomi I hope you are feeling better. I am back.

nomicode · 2020-08-20T12:32:12Z

@marregui there appears to be a downscaling.rst and a downscaling.txt file added. I assume that one of them is the file you want to add and the other is an older version of the same file. can you remove the older version? or indicate to me which one is the newest version so I know which file to edit

thank you!

The idea is to give readers a low entry point to clustering, from which they can expand their knowledge. A Vanilla cluster is a three node cluster that runs on a single host. In this guide we create the cluster, add data to it, and then remove two nodes. Some scripts are required, which are available under the 'scripts' folder. The idea is that users download a zip containing them, or checkout the repo to access them.

marregui · 2020-08-20T12:43:11Z

@mechanomi apologies for this, I have amended it. The correct document is *.rst.

nomicode · 2020-08-26T14:45:30Z

okay great!! thank you. this is looking much better now

since you started on this tutorial, the multi node cluster tutorial was updated quite significantly. check it out: https://crate.io/docs/crate/howtos/en/latest/clustering/multi-node-setup.html
can you update your tutorial so that it directs people there and uses the setup that is documented in that tutorial

if you need to suggest specific config changes, that's fine. but they should be config changes that can be applied to the cluster that the reader has set up following the previous tutorial

you can do away with the whole "Installing from sources" section. let's just provide the reader with one and only one way to do everything. and in this instance, we should direct them to the multi node cluster tutorial and work with the results of that. doing it this way will keep the tutorial easy to follow, easy to understand, and short

then, for thee "Adding some data" section, let's go a similar route. you have already linked to the "generate time series tutorial", which is great. but then the next section goes on to detail how to generate and work with a different type of mock data

for the same reasons as before, I think it's probably best if we take out the stuff to do with csv logs. that includes the script to handle the data as well as the tables/queries

that should simplify that whole section as it's mostly just deleting stuff

for the "Exploring the Data" section, could you rework this so that you're exploring the data generated by the "generate time series data" tutorial running on the multi node cluster set up by the previous tutorial? that way, the tutorial you are writing is building on both of those previous tutorials

this will include switching away from working with the logs table to working with the iss table

hopefully, all of the above is relatively straight forward as it mostly involves deleting and/or slightly adapting what you already have. the real meat of the tutorial is the last section where you demonstrate how to downscale the cluster. and hopefully, the changes I am suggesting don't substantially affect that bit

what do you think?

marregui · 2020-08-26T14:50:18Z

Thank you for the feedback!. All sounds very reasonable, and I will proceed as you describe.

seut

Left a couple of comments. I'd suggest that someone which is not involved works through this to validate that all is understandable and works.

seut · 2020-08-28T09:50:45Z