Skip to content

swarm: minimal set of metrics #1910

Closed
@marten-seemann

Description

@marten-seemann

The objective is to define a minimal set of metrics that the swarm should expose. There are a lot of interesting things we could measure (and we should in the long term, see #1356), but we need to start somewhere.

I propose to define the minimal set as those metrics that allow us to measure the effect of our smarter dialing logic (#1785).

Suggested metrics:

  • conns opened (grouped by: direction, transport / security / muxer)
  • conns closed (grouped by: direction, transport / security / muxer)
  • conn duration histogram: duration between handshake completion and closing (grouped by: transport / security / muxer)
  • dial duration histogram
  • dial errors (group by: error type (timeout, cancellation, other), transport / security / muxer)

When implementing smarter dialing, we expect:

  • the dial errors to decrease dramatically (for cancelations)
  • the dial duration to just increase slightly (otherwise this dialing logic wouldn't be very smart)
  • the number of very short-lived incoming connection (as other nodes upgrade)

Metadata

Metadata

Labels

P0Critical: Tackled by core team ASAPeffort/daysEstimated to take multiple days, but less than a weekexp/intermediatePrior experience is likely helpful

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions