Skip to content

[Zen2] Allow Setting a List of Bootstrap Nodes to Wait for #35847

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

original-brownbear
Copy link
Member

@DaveCTurner just making sure this is what you're looking for? :)

Specifically:

  • We want the list of nodes to be either addresses or names or a mix right?
  • Are you ok with the naming? :)

@original-brownbear original-brownbear added >enhancement WIP v7.0.0 :Distributed Coordination/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. labels Nov 23, 2018
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed

@original-brownbear
Copy link
Member Author

Jenkins test this

@original-brownbear
Copy link
Member Author

@DaveCTurner ping :)

Copy link
Contributor

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a few suggestions. Not quite sure this is the right semantics but we can discuss that, and structurally it's ok.

Will the REST test changes be a followup?

@original-brownbear
Copy link
Member Author

@DaveCTurner I think I addressed all points :)

Will the REST test changes be a followup?

Yea, I guess that's still blocked by other work anyway right?

Copy link
Contributor

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my initial spec I forgot to mention that we expect to be able to match on IP address (without port), sorry. I also asked for a couple more tests.

for (final DiscoveryNode node : nodes) {
boolean matchedAddress = requiredNodes.remove(node.getAddress().toString());
boolean matchedName = requiredNodes.remove(node.getName());
if (matchedAddress && matchedName) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, could we also match on the IP address part of the transport address (TransportAddress#getAddress()) (with tests, obvs)?

This opens up a slight complexity in this check-for-duplicates, because two nodes may share an IP address, so we need to consider this (i.e. have a test for it):

{node1}{10.20.30.40:9301}
{node2}{10.20.30.40:9302}

We should reject requiredNodes being "node1", "10.20.30.40" because 10.20.30.40 matches both, but the obvious change here wouldn't always pick it up.

});

assertTrue(countDownLatch.await(10, TimeUnit.SECONDS));
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we also need (a) a test that matching on node name does work, and (b) a test that matching nonexistent addresses/names does fail.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a test that matching nonexistent addresses/names does fail.

Didn't we decide not to do that in Slack? I remember reading @ywelsch posting something to that effect on Monday?
Either way, in case we want to do this: This means we will actually restrict the allowed master nodes to the given list instead of merely waiting for at least the given nodes as it is implemented now?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not quite what we discussed on Monday. What I mean is that with the PR as it is now (f8de0fc) the following change does not fail any tests, but it should:

diff --git a/server/src/main/java/org/elasticsearch/action/admin/cluster/bootstrap/TransportGetDiscoveredNodesAction.java b/server/src/main/java/org/elasticsearch/action/admin/cluster/bootstrap/TransportGetDiscoveredNodesAction.java
index c9c1d9e4e27..a8f4409d8b3 100644
--- a/server/src/main/java/org/elasticsearch/action/admin/cluster/bootstrap/TransportGetDiscoveredNodesAction.java
+++ b/server/src/main/java/org/elasticsearch/action/admin/cluster/bootstrap/TransportGetDiscoveredNodesAction.java
@@ -141,6 +141,6 @@ public class TransportGetDiscoveredNodesAction extends HandledTransportAction<Ge
                 break;
             }
         }
-        return requiredNodes.isEmpty() && nodes.size() >= request.getWaitForNodes();
+        return nodes.size() >= request.getWaitForNodes();
     }
 }

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah :) That makes sense => will fix in a sec

@@ -95,6 +100,26 @@ public TimeValue getTimeout() {
return timeout;
}

/**
* Sometimes it is useful only to receive a successful response after discovering a certain set of master-eligible nodes.
* This parameter gives the names, hostnames, or transport addresses of the expected nodes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, could you delete hostnames, here, we decided not to do this yet.


/**
* Sometimes it is useful only to receive a successful response after discovering a certain set of master-eligible nodes.
* This parameter gives the names, hostnames, or transport addresses of the expected nodes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, could you delete hostnames, here, we decided not to do this yet.

@original-brownbear
Copy link
Member Author

@DaveCTurner all points addressed I think:

  • Added tests by node name
  • Match by IP too allowing duplicates
  • Added tests for matching by IP
  • Added tests failing if ip + name is matched
  • Added test failing if min master node count is met but required nodes aren't
    • This one I wonder if we can make it nicer. I cringe a little when introducing a test that takes at least 1s but I don't see a way of testing this differently, is there a hook we can exploit so that we don't have to wait for a timeout running out?

@ywelsch ywelsch mentioned this pull request Nov 28, 2018
61 tasks
Copy link
Contributor

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some test suggestions and other minor stuff.

@@ -289,6 +297,150 @@ public void handleException(TransportException exp) {

assertTrue(countDownLatch.await(10, TimeUnit.SECONDS));
}

{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is getting a bit big, and it's hard to see what each block does. I think now we can see some clear patterns it'd make more sense for each block to be a separate test case with common bits extracted out.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dried this up a little now to make the test nicer to read. Separating this out into multiple test cases didn't work (which I think is likely a bug).
Setting timeout zero via getDiscoveredNodesRequest.setTimeout(TimeValue.ZERO) fails the test unless one GetDiscoveredNodesRequest without that setting has run on the same transport service before. I can debug why but maybe better do that in a follow up?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense, I think - the previous PeersRequest notifies the node of the existence of otherNode but not immediately, so there's a race. I think it's ok to start each such test with this to ensure it's set up correctly:

            final GetDiscoveredNodesRequest getDiscoveredNodesRequest = new GetDiscoveredNodesRequest();
            getDiscoveredNodesRequest.setWaitForNodes(2);
            assertWaitConditionMet(getDiscoveredNodesRequest);

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok :) breaking this up then, thanks!


@Override
public void handleException(TransportException exp) {
throw new AssertionError("should not be called");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is called, nearly immediately, because the timeout is zero, so the test doesn't need to wait. The AssertionError seems to be swallowed. Also in this code the countDownLatch is never called so awaiting on it doesn't seem to be useful (but it will be if we fix this).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is ever called here, why would it be, there is no error here? I'm testing that we're actually checking the master nodes list and won't just pass because the minimum master count is met to address this point #35847 (comment) (is there a better way to do this?).

Regardless, I admit the latch isn't really the right thing to do here though. Just sleeping 1s would do the same :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made this into a normal wait now instead of the redundant latch but let me know if you know a better way :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I put a breakpoint on that line, ran the test, and the breakpoint was hit. The exception was:

org.elasticsearch.transport.RemoteTransportException: [node1][0.0.0.0:1][cluster:monitor/discovered_nodes]
Caused by: org.elasticsearch.ElasticsearchTimeoutException: timed out while waiting for GetDiscoveredNodesRequest{waitForNodes=1, timeout=0s, requiredNodes=[0.0.0.0:1, _missing]}
	at org.elasticsearch.action.admin.cluster.bootstrap.TransportGetDiscoveredNodesAction$2.run(TransportGetDiscoveredNodesAction.java:117) [main/:?]
	at org.elasticsearch.threadpool.ThreadPool$LoggingRunnable.run(ThreadPool.java:467) [main/:?]
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:514) [?:?]
	at java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?]
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304) [?:?]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1135) [?:?]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
	at java.lang.Thread.run(Thread.java:844) [?:?]

I think this is right: we asked for an immediate response, but also to wait until node _missing was found, so ElasticsearchTimeoutException seems appropriate. Catching this and using a latch seems like the right thing to do.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right this works fine, not sure why this didn't work for me earlier ... must have been my mistake somehow => will adjust :)

@original-brownbear
Copy link
Member Author

@DaveCTurner done, addressed all points, though there's 2 open questions :)

@original-brownbear
Copy link
Member Author

@DaveCTurner last 2 points addressed:

  • Split up the tests
  • Fixed the waiting on a latch properly in the last one

Copy link
Contributor

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was a bit uneasy with the implementation of checkWaitRequirements - particularly all the branching and state and stuff, and the fact that it specifically only looked for duplicate IP addresses. I found missing test cases by deleting lines and seeing what still passed, and also a test that I do not think should pass.

The following implementation reads better to me, and does the right thing on the test. WDYT?

    private static boolean matchesRequirement(DiscoveryNode discoveryNode, String requirement) {
        return discoveryNode.getName().equals(requirement)
            || discoveryNode.getAddress().toString().equals(requirement)
            || discoveryNode.getAddress().getAddress().equals(requirement);
    }

    private static boolean checkWaitRequirements(GetDiscoveredNodesRequest request, Set<DiscoveryNode> nodes) {
        if (nodes.size() < request.getWaitForNodes()) {
            return false;
        }

        final Set<DiscoveryNode> selectedNodes = new HashSet<>();
        for (final String requirement : request.getRequiredNodes()) {
            final Set<DiscoveryNode> matchingNodes
                = nodes.stream().filter(n -> matchesRequirement(n, requirement)).collect(Collectors.toSet());

            if (matchingNodes.isEmpty()) {
                return false;
            } else if (matchingNodes.size() > 1) {
                throw new IllegalArgumentException("[" + requirement + "] matches " + matchingNodes);
            }

            for (final DiscoveryNode matchingNode : matchingNodes) {
                if (selectedNodes.add(matchingNode) == false) {
                    throw new IllegalArgumentException("[" + matchingNode + "] matches " +
                        request.getRequiredNodes().stream().filter(r -> matchesRequirement(matchingNode, requirement))
                            .collect(Collectors.toList()));
                }
            }
        }

        return true;
    }

final GetDiscoveredNodesRequest getDiscoveredNodesRequest = new GetDiscoveredNodesRequest();
getDiscoveredNodesRequest.setRequiredNodes(
Arrays.asList(localNode.getAddress().toString(), otherNode.getAddress().toString())
);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Newline nit :)

final GetDiscoveredNodesRequest getDiscoveredNodesRequest = new GetDiscoveredNodesRequest();
getDiscoveredNodesRequest.setRequiredNodes(
Arrays.asList(localNode.getName(), otherNode.getName())
);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Newline nit :)

final GetDiscoveredNodesRequest getDiscoveredNodesRequest = new GetDiscoveredNodesRequest();
getDiscoveredNodesRequest.setRequiredNodes(
Arrays.asList(localNode.getAddress().getAddress(), otherNode.getAddress().getAddress())
);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Newline nit :)

final GetDiscoveredNodesRequest getDiscoveredNodesRequest = new GetDiscoveredNodesRequest();
getDiscoveredNodesRequest.setRequiredNodes(
Arrays.asList(localNode.getAddress().getAddress(), localNode.getAddress().toString())
);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Newline nit :)

final GetDiscoveredNodesRequest getDiscoveredNodesRequest = new GetDiscoveredNodesRequest();
getDiscoveredNodesRequest.setRequiredNodes(
Arrays.asList(localNode.getAddress().toString(), localNode.getName())
);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Newline nit :)

final GetDiscoveredNodesRequest getDiscoveredNodesRequest = new GetDiscoveredNodesRequest();
getDiscoveredNodesRequest.setRequiredNodes(
Arrays.asList(localNode.getAddress().toString(), "_missing")
);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Newline nit :)

assertWaitConditionMet(getDiscoveredNodesRequest);
}

public void testGetsDiscoveredNodesByIP() throws InterruptedException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test seems incorrect. The requirement here is ["0.0.0.0", "0.0.0.0"] so both nodes match both addresses so this should fail.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right because we put these in a Set so the duplication goes away ... maybe we want to just fail a duplicate IP setting right away?

Copy link
Member Author

@original-brownbear original-brownbear Nov 30, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise a duplicate IP in the setting means blacklisting that IP ... that's weird? :D
(in fact this holds true for any duplicate entry not just IPs)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think that's right. Putting the same entry in twice is a misconfiguration, and bootstrapping is a dangerous time for a cluster, so I do not think we should be lenient.

@original-brownbear
Copy link
Member Author

@DaveCTurner

the implementation you suggest does not pass:

    public void testGetsDiscoveredNodesByIP() throws InterruptedException {
        setupGetDiscoveredNodesAction();
        final GetDiscoveredNodesRequest getDiscoveredNodesRequest = new GetDiscoveredNodesRequest();
        getDiscoveredNodesRequest.setRequiredNodes(
            Arrays.asList(localNode.getAddress().getAddress()));
        getDiscoveredNodesRequest.setTimeout(TimeValue.ZERO);
        assertWaitConditionMet(getDiscoveredNodesRequest);
    }

when it should pass this, because we said we'd allow multiple nodes on the same IP didn't we?
So if we do that we need the special case handling for the IP?

@original-brownbear
Copy link
Member Author

@DaveCTurner Addressed all points. Made the check break out on the count condition like in your implementation and added a breakout on duplicate entries in the settings.
Let me know if I misunderstood something about the IPs being a special case. If they are then I think my implementation may be a little easier to read. If they aren't I think a symbiosis of both of our approaches may be easiest :D

@DaveCTurner
Copy link
Contributor

we said we'd allow multiple nodes on the same IP didn't we?

We allow it, but we shouldn't allow bootstrapping to be vague about which node on each IP we mean. If you are running multiple nodes on one host then you must be more specific, either using the address/port combination or else the node names.

@original-brownbear
Copy link
Member Author

@DaveCTurner ah ok got it :) Adjusted the logic accordingly now and pretty much used your impl. :)

private static boolean checkWaitRequirements(GetDiscoveredNodesRequest request, Set<DiscoveryNode> nodes) {
List<String> requirements = request.getRequiredNodes();
if (requirements.size() != new HashSet<>(requirements).size()) {
throw new IllegalArgumentException("There are duplicate entries in [cluster.initial_master_nodes]");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we don't need this, and if you omit it then you get more useful messages (i.e. it specifies one of the problematic requirements).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, removed it and adjusted test accordingly.

@@ -93,7 +96,7 @@ public void accept(Iterable<DiscoveryNode> nodes) {
nodesSet.add(localNode);
nodes.forEach(nodesSet::add);
logger.trace("discovered {}", nodesSet);
if (nodesSet.size() >= request.getWaitForNodes() && listenerNotified.compareAndSet(false, true)) {
if (checkWaitRequirements(request, nodesSet) && listenerNotified.compareAndSet(false, true)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this needs a try {...} catch (Exception e) { listenableFuture.onFailure(e); } around it, but the test to find this needs to interleave the steps of the other tests so is a bit ugly:

    public void testThrowsExceptionIfDuplicateDiscoveredLater() throws InterruptedException {
        new TransportGetDiscoveredNodesAction(Settings.EMPTY, EMPTY_FILTERS, transportService, coordinator); // registers action
        transportService.start();
        transportService.acceptIncomingRequests();
        coordinator.start();
        coordinator.startInitialJoin();

        final GetDiscoveredNodesRequest getDiscoveredNodesRequest = new GetDiscoveredNodesRequest();
        final String ip = localNode.getAddress().getAddress();
        getDiscoveredNodesRequest.setRequiredNodes(Collections.singletonList(ip));
        getDiscoveredNodesRequest.setWaitForNodes(2);

        final CountDownLatch countDownLatch = new CountDownLatch(1);
        transportService.sendRequest(localNode, GetDiscoveredNodesAction.NAME, getDiscoveredNodesRequest, new ResponseHandler() {
            @Override
            public void handleResponse(GetDiscoveredNodesResponse response) {
                throw new AssertionError("should not be called");
            }

            @Override
            public void handleException(TransportException exp) {
                Throwable t = exp.getRootCause();
                assertThat(t, instanceOf(IllegalArgumentException.class));
                assertThat(t.getMessage(), startsWith('[' + ip + "] matches ["));
                countDownLatch.countDown();
            }
        });

        threadPool.generic().execute(() ->
            transportService.sendRequest(localNode, REQUEST_PEERS_ACTION_NAME, new PeersRequest(otherNode, emptyList()),
                new TransportResponseHandler<PeersResponse>() {
                    @Override
                    public PeersResponse read(StreamInput in) throws IOException {
                        return new PeersResponse(in);
                    }

                    @Override
                    public void handleResponse(PeersResponse response) {
                        logger.info("response: {}", response);
                    }

                    @Override
                    public void handleException(TransportException exp) {
                        logger.info("exception", exp);
                    }

                    @Override
                    public String executor() {
                        return Names.SAME;
                    }
                }));

        assertTrue(countDownLatch.await(10, TimeUnit.SECONDS));
    }

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added :) (tried making it a little nicer by drying it up against other tests)

Copy link
Contributor

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, LGTM.

@original-brownbear original-brownbear merged commit 986bf52 into elastic:zen2 Nov 30, 2018
@original-brownbear original-brownbear deleted the zen2-initial-nodes-list branch November 30, 2018 17:53
@original-brownbear
Copy link
Member Author

@DaveCTurner thanks! (+sorry for the many back and forths!)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Coordination/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. >enhancement v7.0.0-beta1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants