Skip to content
This repository was archived by the owner on Aug 18, 2020. It is now read-only.

Single-machine multi-node mixed cluster CI prerequisites #4247

Merged
merged 8 commits into from
Oct 29, 2019

Conversation

deepfire
Copy link
Contributor

@deepfire deepfire commented Oct 23, 2019

This supplies the necessary changes for a mixed-cluster integration test, as per IntersectMBO/cardano-node#255 :

  1. mainnet_ci_full genesis & configuration, starting in OBFT node
  2. fix for an OBFT EBB rollback issue, which was trying to erase EBB even if the chain was started in OBFT mode, leading to Mixed cluster in CI IntersectMBO/cardano-node#255 (comment)
  3. change in network-transport-tcp to be more lenient regarding remote address claims: deepfire/network-transport-tcp@44f84a8. This is necessary to avoid problems when starting multiple nodes on the same machine.
  4. small improvements in genesis generation

Additionally, this resets the protocol version for the shelley_staging_short_full configuration to 0 -- a prerequisite for its respin.

NOTE: perhaps this PR should be split. But then, this repository sees very little activity, so perhaps the separation wouldn't have much benefit. I don't have a strong opinion myself.

Copy link
Contributor

@intricate intricate left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good work! Just a few concerns.

I don't think this is worth splitting into multiple PRs. It's already quite small and your commits clearly separate the work done for each concern.

We should also request review from someone who also worked on this OBFT stuff like @mhuesch or @erikd.

@intricate intricate requested review from erikd and mhuesch October 23, 2019 15:57
@deepfire deepfire force-pushed the serge/obft-mainnet-ci branch from 6743bc7 to 5530860 Compare October 23, 2019 16:03
@deepfire deepfire requested a review from avieth October 23, 2019 16:07
@avieth
Copy link
Contributor

avieth commented Oct 23, 2019

change in network-transport-tcp to be more lenient regarding remote address claims: deepfire/network-transport-tcp@44f84a8. This is necessary to avoid problems when starting multiple nodes on the same machine.

What exactly is the problem? That patch to nt-tcp opens up a denial of service attack. The check was put there for good reason.

Copy link
Contributor

@avieth avieth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

network-transport-tcp changes are not acceptable

@avieth
Copy link
Contributor

avieth commented Oct 23, 2019

If you really must disable the hostname check, you can set it in TCPParameters http://hackage.haskell.org/package/network-transport-tcp-0.6.0/docs/Network-Transport-TCP.html#t:TCPParameters

tcpCheckPeerHost :: Bool

If True, new connections will be accepted only if the socket's host matches the host that the peer claims in its EndPointAddress. This is useful when operating on untrusted networks, because the peer could otherwise deny service to some victim by claiming the victim's address.

But beware! Your node will be vulnerable.

@deepfire
Copy link
Contributor Author

@deepfire deepfire force-pushed the serge/obft-mainnet-ci branch from 5530860 to 03c458b Compare October 23, 2019 16:25
@deepfire
Copy link
Contributor Author

@avieth, the example failure:

Sep 09 01:46:48 a-node xadsbs7wlnm94wv14v9w8c0ba1dpjsyw-unit-script-cardano-node-legacy_-start[22055]: [cardano-sl.diffusion.outboundqueue.c-a-2:Warning:ThreadId 150] [2001-09-09 01:46:48.23 UTC] sending MsgAnnounceBlockHeader OriginSender to NodeId 10.1.0.3:3000:0 failed with TransportError ConnectFailed "setupRemoteEndPoint: Host mismatch. Claimed: 10.1.0.2; Numeric: 10.1.0.3; Resolved: c-b-1.cardano" :: SomeException
Sep 09 01:46:48 a-node xadsbs7wlnm94wv14v9w8c0ba1dpjsyw-unit-script-cardano-node-legacy_-start[22055]: [cardano-sl.diffusion.outboundqueue.c-a-2:Warning:ThreadId 149] [2001-09-09 01:46:48.23 UTC] sending MsgAnnounceBlockHeader OriginSender to NodeId 10.1.0.1:3000:0 failed with TransportError ConnectFailed "setupRemoteEndPoint: Host mismatch. Claimed: 10.1.0.2; Numeric: 10.1.0.1; Resolved: c-a-1.cardano" :: SomeException
Sep 09 01:46:48 a-node xadsbs7wlnm94wv14v9w8c0ba1dpjsyw-unit-script-cardano-node-legacy_-start[22055]: [cardano-sl.diffusion.outboundqueue.c-a-2:Warning:ThreadId 155] [2001-09-09 01:46:48.23 UTC] sending MsgAnnounceBlockHeader OriginSender to NodeId 10.1.0.7:3000:0 failed with TransportError ConnectFailed "setupRemoteEndPoint: Host mismatch. Claimed: 10.1.0.2; Numeric: 10.1.0.7; Resolved: c-d-1.cardano" :: SomeException
[root@a-node:~]# netstat -pltn | grep cardano
tcp        0      0 10.1.0.2:3000           0.0.0.0:*               LISTEN      22055/cardano-node- 
tcp        0      0 10.1.0.5:3000           0.0.0.0:*               LISTEN      22078/cardano-node- 
tcp        0      0 10.1.0.6:3000           0.0.0.0:*               LISTEN      22075/cardano-node- 
tcp        0      0 10.1.0.7:3000           0.0.0.0:*               LISTEN      22068/cardano-node- 
tcp        0      0 10.1.0.3:3000           0.0.0.0:*               LISTEN      22092/cardano-node- 
tcp        0      0 10.1.0.4:3000           0.0.0.0:*               LISTEN      22083/cardano-node- 
tcp        0      0 10.1.0.1:3000           0.0.0.0:*               LISTEN      22053/cardano-node- 
[root@a-node:~]# host 10.1.0.1
1.0.1.10.in-addr.arpa domain name pointer c-a-1.cardano.

[root@a-node:~]# host 10.1.0.2
2.0.1.10.in-addr.arpa domain name pointer c-a-2.cardano.

[root@a-node:~]# host 10.1.0.3
3.0.1.10.in-addr.arpa domain name pointer c-b-1.cardano.

[root@a-node:~]# host 10.1.0.4
4.0.1.10.in-addr.arpa domain name pointer c-b-2.cardano.

@deepfire
Copy link
Contributor Author

@avieth,

  1. dropped the network-transport-tcp change
  2. added a CLI option to drive the host peer address consistency checking -- accompanied with sufficiently scary message during startup & comments

@deepfire deepfire force-pushed the serge/obft-mainnet-ci branch from 0e8cb4c to f32a634 Compare October 28, 2019 17:44
Copy link
Contributor

@intricate intricate left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Let's just await a green light from @avieth.

@deepfire
Copy link
Contributor Author

bors r+

iohk-bors bot added a commit that referenced this pull request Oct 28, 2019
4247: Single-machine multi-node mixed cluster CI prerequisites r=deepfire a=deepfire

This supplies the necessary changes for a mixed-cluster integration test, as per IntersectMBO/cardano-node#255 :

1. `mainnet_ci_full` genesis & configuration, starting in OBFT node
2. fix for an OBFT EBB rollback issue, which was trying to erase EBB even if the chain was started in OBFT mode, leading to IntersectMBO/cardano-node#255 (comment)
3. change in `network-transport-tcp` to be more lenient regarding remote address claims: deepfire/network-transport-tcp@44f84a8.  This is necessary to avoid problems when starting multiple nodes on the same machine.
4. small improvements in genesis generation

Additionally, this resets the protocol version for the `shelley_staging_short_full` configuration to 0 -- a prerequisite for its respin.

*NOTE*: perhaps this PR should be split.  But then, this repository sees very little activity, so perhaps the separation wouldn't have much benefit.  I don't have a strong opinion myself.

Co-authored-by: Kosyrev Serge <[email protected]>
@deepfire
Copy link
Contributor Author

bors r-

@iohk-bors
Copy link
Contributor

iohk-bors bot commented Oct 28, 2019

Canceled

@deepfire
Copy link
Contributor Author

Waiting on @mhuesch's review of the EBB rollback change.

Copy link
Contributor

@mhuesch mhuesch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - the OBFT change seems reasonable & minimal.

-- If we're starting up a chain in OBFT mode, we need to ensure
-- that we don't rollback the actual genesis block out of existence
-- (i.e. the EBB at epoch 0):
BlockHeaderGenesis _ -> unless (tipHeader ^. epochIndexL == 0) $ do
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this looks completely reasonable. My memory is fuzzy, but I thought we had tested scenarios where the cluster started in OBFT mode, but perhaps I am misremembering - it seems like we would've hit the same snag that Serge encountered here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mhuesch: Yeah, we definitely had tested those scenarios before but I think after implementing your rollback solution we eventually ended up accepting a trade-off here such that we'd not be able to start up a chain directly in OBFT.

@deepfire
Copy link
Contributor Author

bors r+

iohk-bors bot added a commit that referenced this pull request Oct 29, 2019
4247: Single-machine multi-node mixed cluster CI prerequisites r=deepfire a=deepfire

This supplies the necessary changes for a mixed-cluster integration test, as per IntersectMBO/cardano-node#255 :

1. `mainnet_ci_full` genesis & configuration, starting in OBFT node
2. fix for an OBFT EBB rollback issue, which was trying to erase EBB even if the chain was started in OBFT mode, leading to IntersectMBO/cardano-node#255 (comment)
3. change in `network-transport-tcp` to be more lenient regarding remote address claims: deepfire/network-transport-tcp@44f84a8.  This is necessary to avoid problems when starting multiple nodes on the same machine.
4. small improvements in genesis generation

Additionally, this resets the protocol version for the `shelley_staging_short_full` configuration to 0 -- a prerequisite for its respin.

*NOTE*: perhaps this PR should be split.  But then, this repository sees very little activity, so perhaps the separation wouldn't have much benefit.  I don't have a strong opinion myself.

Co-authored-by: Kosyrev Serge <[email protected]>
@iohk-bors
Copy link
Contributor

iohk-bors bot commented Oct 29, 2019

@iohk-bors iohk-bors bot merged commit f32a634 into develop Oct 29, 2019
@iohk-bors iohk-bors bot deleted the serge/obft-mainnet-ci branch October 29, 2019 17:03
iohk-bors bot added a commit to input-output-hk/cardano-byron-proxy that referenced this pull request Oct 30, 2019
59: Bump deps and update to changes in them r=deepfire a=deepfire

This:

1. Bumps dependencies to the version used in `cardano-node`,
2. Uses a bumped `cardano-sl` with input-output-hk/cardano-sl#4247
3. Updates to changes in `cardano-ledger` and `ouroboros-network`
4. Adds new dependencies to facilitate the above.

Co-authored-by: Kosyrev Serge <[email protected]>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants