You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Aug 2, 2021. It is now read-only.
network: Suggest peer by address space gap (#2065)
* network/kademlia: proposed solution for peer suggestion in Kademlia by using address space gaps. A thorough description can be found here: ethersphere/SWIPs#32
Co-authored-by: Álvaro <[email protected]>
Copy file name to clipboardExpand all lines: network/README.md
+89-3
Original file line number
Diff line number
Diff line change
@@ -11,8 +11,8 @@ the latter on the downstream peer.
11
11
12
12
Subscribe on StreamerPeer launches an incoming streamer that sends
13
13
a subscribe msg upstream. The streamer on the upstream peer
14
-
handles the subscribe msg by installing the relevant outgoing streamer
15
-
. The modules now engage in a process of upstream sending a sequence of hashes of
14
+
handles the subscribe msg by installing the relevant outgoing streamer.
15
+
The modules now engage in a process of upstream sending a sequence of hashes of
16
16
chunks downstream (OfferedHashesMsg). The downstream peer evaluates which hashes are needed
17
17
and get it delivered by sending back a msg (WantedHashesMsg).
18
18
@@ -121,7 +121,7 @@ the constructor is the Run function itself. which takes a streamerpeer as argume
121
121
### provable streams
122
122
123
123
The swarm hash over the hash stream has many advantages. It implements a provable data transfer
124
-
and provide efficient storage for receipts in the form of inclusion proofs useable for finger pointing litigation.
124
+
and provide efficient storage for receipts in the form of inclusion proofs usable for finger pointing litigation.
125
125
When challenged on a missing chunk, upstream peer will provide an inclusion proof of a chunk hash against the state of the
126
126
sync stream. In order to be able to generate such an inclusion proof, upstream peer needs to store the hash index (counting consecutive hash-size segments) alongside the chunk data and preserve it even when the chunk data is deleted until the chunk is no longer insured.
127
127
if there is no valid insurance on the files the entry may be deleted.
@@ -150,3 +150,89 @@ and simply iterate on index per bin when syncing with a peer.
150
150
priority queues are used for sending chunks so that user triggered requests should be responded to first, session syncing second, and historical with lower priority.
151
151
The request on chunks remains implemented as a dataless entry in the memory store.
152
152
The lifecycle of this object should be more carefully thought through, ie., when it fails to retrieve it should be removed.
153
+
154
+
## Address space gaps
155
+
In order to optimize Kademlia load balancing, performance and peer suggestion, we define the concept of `address space gap`
156
+
or simply `gap`.
157
+
A `gap` is a portion of the overlay address space in which the current node does not know any peer. It could be represented
158
+
as a range of addresses: `0xxx`, meaning `0000-0111`
159
+
160
+
The `proximity order of a gap` or `gap po` is the proximity order of that address space with respect to the nearest peer(s)
161
+
in the kademlia connected table (and considering also the current node address). For example if the node address is `0000`,
162
+
the gap of addresses `1xxx` has proximity order 0. However the proximity order of the gap `01xx` has po 1.
163
+
164
+
The `size of a gap` is defined as the number of addresses that could fit in it. If the area of the whole address space is 1,
165
+
the `size of a gap` could be defined from the `gap po` as `1 / 2 ^ (po + 1)`. For example, our previous `1xxx` gap has a size of
166
+
`1 / (2 ^ 1) = 1/2`. The size of `01xx` is `1 / (2 ^ 2) = 1/4`.
167
+
168
+
In order to increment performance of content retrieval and delivery the node should minimize the size of its gaps, because this
169
+
means that it knows peers near almost all addresses. If the minimum `gap` in the kademlia table is 4, it means that whatever
170
+
look up or forwarding done will be at least 4 po far away. On the other hand, if the node has a 0 po `gap`, it means that
171
+
for half the addresses, the next jump will be still 0 po away!.
172
+
173
+
### Gaps for peer suggestion
174
+
The current kademlia bootstrap algorithm try to fill in the bins (or po spaces) until some level of saturation is reached.
175
+
In the process of doing that, the `gaps` will diminish, but not in the optimal way.
176
+
177
+
For example, if the node address is `00000000`, it is connected only with one peer in bin 0 `10000000` and the known
178
+
addresses for bin 0 are: `10000001` and `11000000`. The current algorithm we will take the first `callable` one, so
179
+
for example, it may suggest `10000001` as next peer. This is not optimal, as the biggest `gap` in bin 0 will still be
180
+
po 1 => `11xxxxxx`. If however, the algorithm is improved searching for a peer which covers a bigger `gap`, `11000000` would
181
+
be selected and now the biggest `gaps` will be po2 => `111xxxx` and `101xxxx`.
182
+
183
+
Additionally, even though the node does not have an address in a particular `gap`, it could still select the furthest away
184
+
from the current peers so it covers a bigger `gap`. In the previous example with node `00000000` and one peer already connected
185
+
`10000000`, if the known addresses are `10000001` and `1001000`, the best suggestion would be the last one, because it is po 3
186
+
from the nearest peer as opposed to `10000001` that is only po 7 away. The best case will cover a `gap` of po 3 size
187
+
(1/16 of area or 16 addresses) and the other one just po 7 size (1/256 area or 1 address).
188
+
189
+
### Gaps and load balancing
190
+
One additional benefit in considering `gaps` is load balancing. If the target addresses are distributed randomly
191
+
(although address popularity is another problem that can also be studied from the `gap` perspective), the request will
192
+
be automatically load balanced if we try to connect to peers covering the bigger `gaps`. Continuing with our example,
193
+
if in bin 0 we have peers `10000000` and `10000001` (Fig. 1), almost all addresses in space `1xxxxxxx`, that is, half of the
194
+
addresses will have the same distance from both peers. If we need to send to some of those address we will need to use
195
+
one of those peers. This could be done randomly, always the first or with some load balancing accounting to use the least
0 commit comments