Skip to content
This repository was archived by the owner on Aug 2, 2021. It is now read-only.

Commit e7e98cf

Browse files
janoskortatu
authored andcommitted
network: Suggest peer by address space gap (#2065)
* network/kademlia: proposed solution for peer suggestion in Kademlia by using address space gaps. A thorough description can be found here: ethersphere/SWIPs#32 Co-authored-by: Álvaro <[email protected]>
1 parent 2c9e315 commit e7e98cf

File tree

6 files changed

+466
-17
lines changed

6 files changed

+466
-17
lines changed

network/README.md

+89-3
Original file line numberDiff line numberDiff line change
@@ -11,8 +11,8 @@ the latter on the downstream peer.
1111

1212
Subscribe on StreamerPeer launches an incoming streamer that sends
1313
a subscribe msg upstream. The streamer on the upstream peer
14-
handles the subscribe msg by installing the relevant outgoing streamer
15-
. The modules now engage in a process of upstream sending a sequence of hashes of
14+
handles the subscribe msg by installing the relevant outgoing streamer.
15+
The modules now engage in a process of upstream sending a sequence of hashes of
1616
chunks downstream (OfferedHashesMsg). The downstream peer evaluates which hashes are needed
1717
and get it delivered by sending back a msg (WantedHashesMsg).
1818

@@ -121,7 +121,7 @@ the constructor is the Run function itself. which takes a streamerpeer as argume
121121
### provable streams
122122

123123
The swarm hash over the hash stream has many advantages. It implements a provable data transfer
124-
and provide efficient storage for receipts in the form of inclusion proofs useable for finger pointing litigation.
124+
and provide efficient storage for receipts in the form of inclusion proofs usable for finger pointing litigation.
125125
When challenged on a missing chunk, upstream peer will provide an inclusion proof of a chunk hash against the state of the
126126
sync stream. In order to be able to generate such an inclusion proof, upstream peer needs to store the hash index (counting consecutive hash-size segments) alongside the chunk data and preserve it even when the chunk data is deleted until the chunk is no longer insured.
127127
if there is no valid insurance on the files the entry may be deleted.
@@ -150,3 +150,89 @@ and simply iterate on index per bin when syncing with a peer.
150150
priority queues are used for sending chunks so that user triggered requests should be responded to first, session syncing second, and historical with lower priority.
151151
The request on chunks remains implemented as a dataless entry in the memory store.
152152
The lifecycle of this object should be more carefully thought through, ie., when it fails to retrieve it should be removed.
153+
154+
## Address space gaps
155+
In order to optimize Kademlia load balancing, performance and peer suggestion, we define the concept of `address space gap`
156+
or simply `gap`.
157+
A `gap` is a portion of the overlay address space in which the current node does not know any peer. It could be represented
158+
as a range of addresses: `0xxx`, meaning `0000-0111`
159+
160+
The `proximity order of a gap` or `gap po` is the proximity order of that address space with respect to the nearest peer(s)
161+
in the kademlia connected table (and considering also the current node address). For example if the node address is `0000`,
162+
the gap of addresses `1xxx` has proximity order 0. However the proximity order of the gap `01xx` has po 1.
163+
164+
The `size of a gap` is defined as the number of addresses that could fit in it. If the area of the whole address space is 1,
165+
the `size of a gap` could be defined from the `gap po` as `1 / 2 ^ (po + 1)`. For example, our previous `1xxx` gap has a size of
166+
`1 / (2 ^ 1) = 1/2`. The size of `01xx` is `1 / (2 ^ 2) = 1/4`.
167+
168+
In order to increment performance of content retrieval and delivery the node should minimize the size of its gaps, because this
169+
means that it knows peers near almost all addresses. If the minimum `gap` in the kademlia table is 4, it means that whatever
170+
look up or forwarding done will be at least 4 po far away. On the other hand, if the node has a 0 po `gap`, it means that
171+
for half the addresses, the next jump will be still 0 po away!.
172+
173+
### Gaps for peer suggestion
174+
The current kademlia bootstrap algorithm try to fill in the bins (or po spaces) until some level of saturation is reached.
175+
In the process of doing that, the `gaps` will diminish, but not in the optimal way.
176+
177+
For example, if the node address is `00000000`, it is connected only with one peer in bin 0 `10000000` and the known
178+
addresses for bin 0 are: `10000001` and `11000000`. The current algorithm we will take the first `callable` one, so
179+
for example, it may suggest `10000001` as next peer. This is not optimal, as the biggest `gap` in bin 0 will still be
180+
po 1 => `11xxxxxx`. If however, the algorithm is improved searching for a peer which covers a bigger `gap`, `11000000` would
181+
be selected and now the biggest `gaps` will be po2 => `111xxxx` and `101xxxx`.
182+
183+
Additionally, even though the node does not have an address in a particular `gap`, it could still select the furthest away
184+
from the current peers so it covers a bigger `gap`. In the previous example with node `00000000` and one peer already connected
185+
`10000000`, if the known addresses are `10000001` and `1001000`, the best suggestion would be the last one, because it is po 3
186+
from the nearest peer as opposed to `10000001` that is only po 7 away. The best case will cover a `gap` of po 3 size
187+
(1/16 of area or 16 addresses) and the other one just po 7 size (1/256 area or 1 address).
188+
189+
### Gaps and load balancing
190+
One additional benefit in considering `gaps` is load balancing. If the target addresses are distributed randomly
191+
(although address popularity is another problem that can also be studied from the `gap` perspective), the request will
192+
be automatically load balanced if we try to connect to peers covering the bigger `gaps`. Continuing with our example,
193+
if in bin 0 we have peers `10000000` and `10000001` (Fig. 1), almost all addresses in space `1xxxxxxx`, that is, half of the
194+
addresses will have the same distance from both peers. If we need to send to some of those address we will need to use
195+
one of those peers. This could be done randomly, always the first or with some load balancing accounting to use the least
196+
used one.
197+
![Fig. 1](https://raw.githubusercontent.com/kortatu/swarm_doc/master/address_space_gaps-lb-1.png)
198+
Fig.1 - Closer peers needs an external Load Balancing mechanism
199+
200+
This last method will still be useful, but if the `gap` filling strategy is used, most probably both peers will
201+
be separated enough that they never compete for an address and a natural load balancing will be made among them (for example,
202+
`10000000` and `11000000` will be used each for half the addresses in bin 0 (Fig. 2)).
203+
![Fig. 2](https://raw.githubusercontent.com/kortatu/swarm_doc/master/address_space_gaps-lb-2.png)
204+
Fig.2 - Peers chosen by space address gap have a natural load balancing
205+
### Implementation
206+
The search for gaps can be done easily using a proximity order tree or `pot`. Traversing the bins of a node, a `gap` is
207+
found if there is some of the po's missing starting from furthest (left). In each level the starting po to search for is the
208+
parent po (not 0, because in the second level, under a node of po=0, the minimum po that could be found is 1).
209+
210+
Implementation of the function that looks for the bigger Gap in a `pot` can be seen in
211+
`pot.BiggestAddressGap`. That function returns the biggest gap in the form of a po and
212+
a node under the gap can be found.
213+
214+
This function is used in `kademlia.suggestPeerInBinByGap`, which it returns a BzzAddress in a particular bin which fills
215+
up the biggest address gap. This function is not used in `SuggestPeer`, but it will be enough to replace the call to
216+
`suggestPeerInBin` with the new one.
217+
218+
### Further improvements
219+
Instead of the size of a gap, maybe it could be more interesting to see the ratio between size and number of current
220+
peers serving that gap. If we have `n` current peers that are equidistant to a particular gap of size `s`,
221+
the load of each of these peers will be on average `s/n`.
222+
We can define a gap's `temperature` as that number `s/n`. When looking for new peers to connect, instead of looking for
223+
bigger gaps we could look for `hotter` gaps.
224+
For example, if in our first example, we can't find a peer in `11xxxxxx` and we instead, used the best peer, we could end
225+
with the configuration in Fig. 3.
226+
![Fig. 3](https://raw.githubusercontent.com/kortatu/swarm_doc/master/address_space_gaps-lb-3.png)
227+
Fig. 3 - Comparing gaps temperature
228+
229+
Here we still have `11xxxxxx` as the biggest gap (po=1, size 1/4), same size as `01xxxxxx`. But if consider temperature,
230+
`01xxxxxx` is hotter because is served only by our node `00000000`, being its temperature is `(1/4)/ 1 = 1/4`. However,
231+
`11xxxxxx` is now served by two peers, so its temperature is `(1/4) / 2 = 1/8`, and that will mean that we will select
232+
`01xxxxxx` as the hotter one.
233+
234+
There is a way of implementing temperature calculation so its cost it is the same as looking for biggest gap. Temperature
235+
can be calculated on the fly as the gap is found using a `pot`.
236+
237+
Other metrics could be considered in the temperature, as recently number of requests per address space, performance of
238+
current peers...

network/kademlia.go

+61-12
Original file line numberDiff line numberDiff line change
@@ -427,28 +427,77 @@ func (k *Kademlia) SuggestPeer() (suggestedPeer *BzzAddr, saturationDepth int, c
427427
return false
428428
}
429429
}
430-
// curPO found
431-
// find a callable peer out of the addresses in the unsaturated bin
432-
// stop if found
433-
bin.ValIterator(func(val pot.Val) bool {
434-
e := val.(*entry)
435-
if k.callable(e) {
436-
suggestedPeer = e.BzzAddr
437-
return false
438-
}
439-
440-
return true
441-
})
430+
suggestedPeer = k.suggestPeerInBin(bin)
442431
return cur < len(bins) && suggestedPeer == nil
443432
}, true)
444433
}
434+
445435
if uint8(saturationDepth) < k.saturationDepth {
446436
k.saturationDepth = uint8(saturationDepth)
447437
return suggestedPeer, saturationDepth, true
448438
}
449439
return suggestedPeer, 0, false
450440
}
451441

442+
func (k *Kademlia) suggestPeerInBin(bin *pot.Bin) *BzzAddr {
443+
var foundPeer *BzzAddr
444+
// curPO found
445+
// find a callable peer out of the addresses in the unsaturated bin
446+
// stop if found
447+
bin.ValIterator(func(val pot.Val) bool {
448+
e := val.(*entry)
449+
if k.callable(e) {
450+
foundPeer = e.BzzAddr
451+
return false
452+
}
453+
return true
454+
})
455+
return foundPeer
456+
}
457+
458+
//suggestPeerInBinByGap tries to find the best peer to connect in a particular bin looking for the biggest
459+
//address gap in the current connections bin of same proximity order instead of using the first address that is
460+
//callable. In case there is no current bin of po = bin.ProximityOrder, or is empty, the usual suggestPeerInBin algorithm
461+
//will take place.
462+
//bin parameter is the bin in the addresses in which to select a BzzAddr
463+
//return value is the BzzAddr selected
464+
func (k *Kademlia) suggestPeerInBinByGap(bin *pot.Bin) *BzzAddr {
465+
connBin := k.defaultIndex.conns.PotWithPo(k.base, bin.ProximityOrder, Pof)
466+
if connBin == nil {
467+
return k.suggestPeerInBin(bin)
468+
}
469+
gapPo, gapVal := connBin.BiggestAddressGap()
470+
// I need an address in the missing gapPo space with respect to gapVal
471+
// the lower gapPo the biggest the address space gap
472+
var foundPeer *BzzAddr
473+
var candidatePeer *BzzAddr
474+
furthestPo := 256
475+
// find a callable peer out of the addresses in the unsaturated bin
476+
// stop if found
477+
bin.ValIterator(func(val pot.Val) bool {
478+
e := val.(*entry)
479+
addrPo, _ := Pof(gapVal, e.BzzAddr, bin.ProximityOrder)
480+
if k.callable(e) {
481+
if addrPo == gapPo {
482+
foundPeer = e.BzzAddr
483+
return false
484+
}
485+
if addrPo < furthestPo {
486+
furthestPo = addrPo
487+
candidatePeer = e.BzzAddr
488+
}
489+
return true
490+
}
491+
return true
492+
})
493+
if foundPeer != nil {
494+
return foundPeer
495+
} else {
496+
// Peer with an address po away from pin not found, so we return the farthest
497+
return candidatePeer
498+
}
499+
}
500+
452501
// On inserts the peer as a kademlia peer into the live peers
453502
func (k *Kademlia) On(p *Peer) (uint8, bool) {
454503
k.lock.Lock()

network/kademlia_test.go

+75
Original file line numberDiff line numberDiff line change
@@ -1069,3 +1069,78 @@ func TestCapabilityNeighbourhoodDepth(t *testing.T) {
10691069
t.Fatalf("cap 'one' expected depth 2, was %d", depth)
10701070
}
10711071
}
1072+
1073+
//TestSuggestPeerInBinByGap will check that when several addresses are available for register in the same bin, the
1074+
//one suggested is the one that fills the biggest gap of address in that bin.
1075+
func TestSuggestPeerInBinByGap(t *testing.T) {
1076+
tk := newTestKademlia(t, "11111111")
1077+
tk.Register("00000000", "00000001")
1078+
bin0 := tk.getAddressBin(0)
1079+
if bin0 == nil {
1080+
t.Errorf("Expected bin 0 in addresses to be found but is nil")
1081+
}
1082+
1083+
// Adding 00000000 for example, doesn't really mater among the first two
1084+
tk.On("00000000")
1085+
tk.Register("01000000")
1086+
suggestedByGapPeer := tk.suggestPeerInBinByGap(tk.getAddressBin(0))
1087+
binaryString := bzzAddrToBinary(suggestedByGapPeer)
1088+
// Expected suggestion is 01000000 because it covers bigger part of the address space in bin 0.
1089+
if binaryString != "01000000" {
1090+
t.Errorf("Expected suggestion by gap to be 01000000 because is in po=1 gap, but got %v", binaryString)
1091+
}
1092+
// Adding 01000000
1093+
tk.On(binaryString)
1094+
//Now wi will try to fill in po 1
1095+
tk.Register("10000000", "11110000")
1096+
bin1 := tk.getAddressBin(1)
1097+
//Among the two peers in first one (10000000) covers more gap than the other one in our kademlia table (is farther from
1098+
// our base 11111111)
1099+
suggestedByGapPeer = tk.suggestPeerInBinByGap(bin1)
1100+
binaryString = bzzAddrToBinary(suggestedByGapPeer)
1101+
if binaryString != "10000000" {
1102+
t.Errorf("Expected suggestion by gap to be 10000000 because is in po=1 gap, but got %v", binaryString)
1103+
}
1104+
}
1105+
1106+
//TestSuggestPeerInBinByGapCandidate checks than when suggesting addresses, if an address in the desired gap can't be
1107+
//found, the furthest away from the reference peer will be chosen (the one with lower po so it will fill up a bigger
1108+
//part of the gap)
1109+
func TestSuggestPeerInBinByGapCandidate(t *testing.T) {
1110+
tk := newTestKademlia(t, "11111111")
1111+
tk.On("00000000", "10000000")
1112+
//Registering address (10000100) po=5 from 1000000 to leave a big gap [2..4]
1113+
tk.On("10000100")
1114+
//Now we are going to suggest a biggest gap that doesn't match with any of the available addresses. The algorithm
1115+
//should take the furthest from the reference address (parent of the gap, so 10000000)
1116+
//Now we have a gap po=2 under 10000000 in bin1. We are not going to register an address po=2 (f.ex. 10100000) but
1117+
//two addresses at po=3 and po=4 from it. Algorithm should return the farthest candidate(po=3).
1118+
//10010000 => po=3 from 10000000
1119+
//10001000 => po=4 from 10000000
1120+
tk.Register("10010000", "10001000")
1121+
suggestedCandidate := tk.suggestPeerInBinByGap(tk.getAddressBin(1))
1122+
binaryString := bzzAddrToBinary(suggestedCandidate)
1123+
if binaryString != "10010000" {
1124+
t.Errorf("Expected furthest candidate to be 10010000 at po=3, but got %v", binaryString)
1125+
}
1126+
}
1127+
1128+
//getAddressBin is an utility function to obtain a Bin by po
1129+
func (tk *testKademlia) getAddressBin(po int) *pot.Bin {
1130+
var theBin *pot.Bin
1131+
tk.defaultIndex.addrs.EachBin(tk.base, Pof, po, func(bin *pot.Bin) bool {
1132+
if bin.ProximityOrder == po {
1133+
theBin = bin
1134+
return false
1135+
} else if bin.ProximityOrder > po {
1136+
return false
1137+
} else {
1138+
return true
1139+
}
1140+
}, true)
1141+
return theBin
1142+
}
1143+
1144+
func bzzAddrToBinary(bzzAddress *BzzAddr) string {
1145+
return byteToBitString(bzzAddress.OAddr[0])
1146+
}

network_test.go

+3-2
Original file line numberDiff line numberDiff line change
@@ -352,13 +352,14 @@ func testSwarmNetwork(t *testing.T, o *testSwarmNetworkOptions, steps ...testSwa
352352

353353
for syncing := true; syncing; {
354354
syncing = false
355+
time.Sleep(1 * time.Second)
356+
355357
for _, id := range nodeIDs {
356358
if sim.MustNodeItem(id, bucketKeyInspector).(*api.Inspector).IsPullSyncing() {
357359
syncing = true
360+
break
358361
}
359362
}
360-
361-
time.Sleep(1 * time.Second)
362363
}
363364

364365
for {

pot/pot.go

+83
Original file line numberDiff line numberDiff line change
@@ -925,3 +925,86 @@ func (t *Pot) sstring(indent string) string {
925925
}
926926
return s
927927
}
928+
929+
//PotWithPo returns a Pot with all elements with proximity order desiredPo w.r.t. pivotVal.
930+
//is similar to obtain a bin but in a tree structure that helps in some calculations
931+
func (t *Pot) PotWithPo(pivotVal Val, desiredPo int, pof Pof) *Pot {
932+
if t == nil || t.size == 0 {
933+
return nil
934+
}
935+
pivotProximityOrder, _ := pof(t.pin, pivotVal, 0)
936+
pivotPot, pivotBinIndex := t.getPos(pivotProximityOrder)
937+
if pivotProximityOrder < desiredPo {
938+
if pivotPot != nil && pivotPot.po == pivotProximityOrder {
939+
return pivotPot.PotWithPo(pivotVal, desiredPo, pof)
940+
} else { //There is no bin with the desired po
941+
return nil
942+
}
943+
}
944+
if pivotProximityOrder == desiredPo {
945+
prunedPot := t.clone()
946+
prunedPot.po = desiredPo
947+
actualPivotPlace := pivotBinIndex
948+
if pivotPot == nil {
949+
actualPivotPlace--
950+
}
951+
var removedBinsSize int
952+
for i := 0; i < len(prunedPot.bins) && i <= actualPivotPlace; i++ {
953+
removedBinsSize += prunedPot.bins[i].size
954+
}
955+
prunedPot.size = prunedPot.size - removedBinsSize
956+
if prunedPot.bins != nil {
957+
prunedPot.bins = prunedPot.bins[actualPivotPlace+1:]
958+
}
959+
return prunedPot
960+
}
961+
// if pivotProximityOrder > desiredPo
962+
for i := 0; i < len(t.bins); i++ {
963+
n := t.bins[i]
964+
if n.po == desiredPo {
965+
return n
966+
}
967+
}
968+
return nil
969+
}
970+
971+
//BiggestAddressGap tries to find the biggest address not covered by an element in the address space.
972+
//Biggest gaps tend to be top left of the tree (if the pot is rendered root top and bins with po = 0 left).
973+
//As the bins progress to the right or down (higher proximity order) the address space gap left is smaller.
974+
//An address gap is defined as a missing proximity order without any value. So for example, a root value with two
975+
//bins, one with po 0 and one with po 2 has a gap in po=1. Of course it also has a gap in po>=3 but that gap is smaller
976+
//in number of addresses contained. If the total space area is 1, the space covered by a bin of proximity order n can
977+
//be defined as 1/2^n. So po=0 will occupy half of the area, po=5 1/32 of the area and so on.
978+
//When a gap is found there is no need to go further on that level because advancing (horizontally or vertically) will
979+
//decrease the maximum gap space by half.
980+
//The function returns the proximity order of the gap and the reference value where the gap has been found (so the
981+
//exact address set can be calculated)
982+
func (t *Pot) BiggestAddressGap() (po int, val Val) {
983+
if t == nil || t.size == 0 {
984+
return 0, nil
985+
}
986+
987+
if len(t.bins) == 0 {
988+
return t.po + 1, t.pin
989+
}
990+
991+
wrt := t.pin
992+
biggest := 256
993+
last := t.po
994+
for _, subPot := range t.bins {
995+
if subPot.po > last+1 && last+1 <= biggest {
996+
wrt = t.pin
997+
biggest = last + 1
998+
break
999+
} else {
1000+
last = subPot.po
1001+
subBiggest, aVal := subPot.BiggestAddressGap()
1002+
if subBiggest < biggest {
1003+
biggest = subBiggest
1004+
wrt = aVal
1005+
}
1006+
}
1007+
}
1008+
1009+
return biggest, wrt
1010+
}

0 commit comments

Comments
 (0)