Skip to content

Commit d56ede4

Browse files
authored
updated monitoring composition for local development (#21180)
1 parent 0b07b90 commit d56ede4

File tree

11 files changed

+172
-117
lines changed

11 files changed

+172
-117
lines changed
Loading

doc/developer/tracing.md

+51-7
Original file line numberDiff line numberDiff line change
@@ -144,16 +144,14 @@ _automatically_ collected and exported to analysis tools with _no additional wor
144144
### Setup
145145

146146
- For local setup, use the instructions here:
147-
<https://github.com/MaterializeInc/materialize/tree/main/misc/opentelemetry> to setup a local OpenTelemetry collector
148-
and ui to view traces.
149-
- TODO(guswynn): cloud honeycomb setup when its available
150-
- TODO(guswynn): link to demo video
147+
<https://github.com/MaterializeInc/materialize/tree/main/misc/monitoring> to setup a local OpenTelemetry collector
148+
(Tempo) and UI (Grafana) to view traces.
151149

152150
### Span visualization
153151

154-
OpenTelemetry UI's like the [Jaeger] one in the local setup _are extraordinarly powerful tools for debugging_. They allow you to visualize
152+
OpenTelemetry UI's like the Grafana/Tempo one in the local setup _are extraordinarly powerful tools for debugging_. They allow you to visualize
155153
and inspect the exact control flow graph of your service, including the latency of each operation, and contextual recorded fields. See
156-
[Best Pratices](#best-practices) for more information.
154+
[Best Practices](#best-practices) for more information.
157155

158156
### Distributed tracing
159157
[OpenTelemetry] allows us to associate `tracing` spans _from distributed services_ with each other, allowing us to not only visualize
@@ -224,6 +222,52 @@ and `mz_ore::tracing::OpenTelemetryContext::attach_as_parent()` on the receiving
224222
- `TRACE`:
225223
- exceedingly verbose information intended only for local debugging/tracing
226224

225+
## Accessing Tracing Data Locally
226+
227+
To setup trace collection locally, start the `../../misc/monitoring` composition. It will spin up Tempo to
228+
automatically start storing traces, and Grafana to visualize them.
229+
230+
Then start `./bin/environmentd --monitoring`.
231+
232+
### Setting the Trace Filter
233+
234+
By default, `./bin/environmentd` will only emit `INFO`-level spans. The filter is controlled through the
235+
`opentelemetry_filter` system variable and can be toggled dynamically:
236+
237+
```
238+
> psql -U mz_system -h localhost -p 6877
239+
240+
mz_system=> ALTER SYSTEM SET opentelemetry_filter="debug";
241+
```
242+
243+
Or on startup:
244+
245+
```
246+
./bin/environmentd --reset --monitoring -- --system-parameter-default='opentelemetry_filter=debug'
247+
```
248+
249+
More details on [the filter syntax] can be found in Notion.
250+
251+
### Trace ID Notices
252+
253+
It's often valuable to see the traces associated with specific queries. This can be done
254+
by setting the session variable:
255+
256+
```
257+
SET emit_trace_id_notice = true;
258+
```
259+
260+
Then each subsequent query will emit a NOTICE containing the trace ID for its execution:
261+
262+
```
263+
materialize=> SELECT 1;
264+
NOTICE: trace id: 65e87c160063307a9d1221f78ae55cf8
265+
```
266+
267+
This trace ID can then be plugged into Grafana's TraceQL lookup for an exact match:
268+
269+
![grafana tempo trace id lookup](./assets/grafana-tempo-trace-id-lookup.png)
270+
227271

228272

229273

@@ -240,6 +284,6 @@ and `mz_ore::tracing::OpenTelemetryContext::attach_as_parent()` on the receiving
240284
[many crates]: https://docs.rs/tracing/latest/tracing/#related-crates
241285
[OpenTelemetry]: https://opentelemetry.io/
242286
[here]: https://docs.rs/tracing/latest/tracing/struct.Span.html#in-asynchronous-code
243-
[Jaeger]: https://www.jaegertracing.io/
244287
[`tracing::Span`]: https://docs.rs/tracing/latest/tracing/struct.Span.html
245288
[the docs]: https://dev.materialize.com/api/rust/mz_ore/tracing/struct.OpenTelemetryContext.html
289+
[the filter syntax]: https://www.notion.so/materialize/Filtering-Logs-and-Traces-6e8fcce8f39e4b45b94ea2923cce05dc?pvs=4

misc/monitoring/README.md

+22-10
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,13 @@
11
# Local monitoring composition
22

3-
An [mzcompose] composition for running Prometheus and Grafana locally. Metrics
4-
from all processes run by the `bin/environmentd` script will be automatically
5-
discovered and scraped.
3+
An [mzcompose] composition for running a minimal monitoring stack of Prometheus
4+
(metrics), Tempo (distributed tracing), and Grafana locally. Metrics from all
5+
processes run by the `bin/environmentd` script will be automatically discovered
6+
and collected, and traces will be collected when run with `--monitoring`:
7+
8+
```
9+
./bin/environmentd --monitoring
10+
```
611

712
## Usage
813

@@ -11,17 +16,23 @@ cd misc/monitoring
1116
./mzcompose run default
1217
```
1318

14-
To access Prometheus, run `./mzcompose web prometheus` or navigate directly to
15-
<http://localhost:9090> in your browser.
16-
1719
To access Grafana, run `./mzcompose web grafana` or navigate directly to
18-
<http://localhost:3000> in your browser.
20+
<http://localhost:3000> in your browser. Grafana will have datasources
21+
for both Prometheus and Tempo by default.
22+
23+
If needed, to access Prometheus directly, run `./mzcompose web prometheus`
24+
or navigate directly to <http://localhost:9090> in your browser.
25+
26+
Tempo does not have its own UI and can be only accessed via Grafana.
27+
See [the tracing docs] for more info on how to filter and access tracing
28+
data.
1929

2030
## Modifications
2131

22-
You can adjust the Prometheus configuration by editing the
23-
[prometheus.yml](./prometheus.yml) file in this directory. It is bind mounted
24-
into the container, so edits will be picked up the next time you restart:
32+
You can adjust the Prometheus and Tempo configurations by editing the
33+
[prometheus.yml](./prometheus.yml) and [tempo.yml](./tempo.yml) files in
34+
this directory. They are bind mounted into the containers, so edits will
35+
be picked up the next time you restart:
2536

2637
```
2738
./mzcompose down -v
@@ -32,4 +43,5 @@ You can adjust the Grafana configuration through its UI. Beware that the
3243
Grafana state is ephemeral, and will be lost the next time you run
3344
`./mzcompose down`.
3445

46+
[the tracing docs]: ../../doc/developer/tracing.md#accessing-tracing-data-locally
3547
[mzcompose]: ../../doc/developer/mzcompose.md

misc/monitoring/grafana/datasources/prometheus.yml

+2
Original file line numberDiff line numberDiff line change
@@ -12,3 +12,5 @@ datasources:
1212
- name: Prometheus
1313
type: prometheus
1414
url: http://prometheus:9090
15+
jsonData:
16+
timeInterval: 5s
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,3 @@
1-
#!/usr/bin/env bash
2-
31
# Copyright Materialize, Inc. and contributors. All rights reserved.
42
#
53
# Use of this software is governed by the Business Source License
@@ -8,7 +6,9 @@
86
# As of the Change Date specified in that file, in accordance with
97
# the Business Source License, use of this software will be governed
108
# by the Apache License, Version 2.0.
11-
#
12-
# mzcompose — runs Docker Compose with Materialize customizations.
139

14-
exec "$(dirname "$0")"/../../bin/pyactivate -m materialize.cli.mzcompose "$@"
10+
apiVersion: 1
11+
datasources:
12+
- name: Tempo
13+
type: tempo
14+
url: http://tempo:3200

misc/monitoring/mzcompose.py

+24-6
Original file line numberDiff line numberDiff line change
@@ -15,20 +15,37 @@
1515
Service(
1616
"prometheus",
1717
{
18-
"image": "prom/prometheus:v2.41.0",
18+
"image": "prom/prometheus:v2.46.0",
1919
"ports": ["9090:9090"],
2020
"volumes": [
2121
"./prometheus.yml:/etc/prometheus/prometheus.yml",
2222
"../../mzdata/prometheus:/mnt/services",
2323
],
24+
"command": [
25+
"--config.file=/etc/prometheus/prometheus.yml",
26+
"--web.enable-remote-write-receiver",
27+
],
2428
"extra_hosts": ["host.docker.internal:host-gateway"],
2529
"allow_host_ports": True,
2630
},
2731
),
32+
Service(
33+
"tempo",
34+
{
35+
"image": "grafana/tempo:2.2.0",
36+
"ports": ["4317:4317", "3200:3200"],
37+
"volumes": [
38+
"./tempo.yml:/etc/tempo.yml",
39+
"../../mzdata/tempo:/tmp/tempo",
40+
],
41+
"command": ["-config.file=/etc/tempo.yml"],
42+
"allow_host_ports": True,
43+
},
44+
),
2845
Service(
2946
"grafana",
3047
{
31-
"image": "grafana/grafana:9.3.2",
48+
"image": "grafana/grafana:10.0.3",
3249
"ports": ["3000:3000"],
3350
"environment": [
3451
"GF_AUTH_ANONYMOUS_ENABLED=true",
@@ -44,11 +61,12 @@
4461

4562

4663
def workflow_default(c: Composition) -> None:
47-
# Create the `mzdata/prometheus` directory that will be bind mounted into
48-
# the container before invoking Docker Compose, since otherwise the Docker
49-
# daemon will create the directory as root, and `environmentd` won't be
50-
# able to write to it.
64+
# Create the `mzdata/prometheus|tempo` directories that will be bind mounted into
65+
# the containers before invoking Docker Compose, since otherwise the Docker daemon
66+
# will create the directory as root, and `environmentd` won't be able to write to them.
5167
(MZ_ROOT / "mzdata" / "prometheus").mkdir(parents=True, exist_ok=True)
68+
(MZ_ROOT / "mzdata" / "tempo").mkdir(parents=True, exist_ok=True)
5269
c.up()
5370
print(f"Prometheus running at http://localhost:{c.default_port('prometheus')}")
71+
print(f"Tempo running at http://localhost:{c.default_port('tempo')}")
5472
print(f"Grafana running at http://localhost:{c.default_port('grafana')}")

misc/monitoring/prometheus.yml

+8-2
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@
88
# by the Apache License, Version 2.0.
99

1010
global:
11-
scrape_interval: 15s
11+
scrape_interval: 5s
1212
scrape_configs:
1313
- job_name: environmentd
1414
static_configs:
@@ -20,7 +20,7 @@ scrape_configs:
2020
file_sd_configs:
2121
- files:
2222
- /mnt/services/*.json
23-
refresh_interval: 30s
23+
refresh_interval: 5s
2424
relabel_configs:
2525
# Rewrite references to 127.0.0.1 or 0.0.0.0 to host.docker.internal,
2626
# since the services are running on the host.
@@ -44,3 +44,9 @@ scrape_configs:
4444
separator: "-"
4545
target_label: pod
4646
replacement: $1-0
47+
- job_name: prometheus
48+
static_configs:
49+
- targets: [ localhost:9090 ]
50+
- job_name: tempo
51+
static_configs:
52+
- targets: [ tempo:3200 ]

misc/monitoring/tempo.yml

+46
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,46 @@
1+
# Copyright Materialize, Inc. and contributors. All rights reserved.
2+
#
3+
# Use of this software is governed by the Business Source License
4+
# included in the LICENSE file at the root of this repository.
5+
#
6+
# As of the Change Date specified in that file, in accordance with
7+
# the Business Source License, use of this software will be governed
8+
# by the Apache License, Version 2.0.
9+
10+
server:
11+
http_listen_port: 3200
12+
13+
distributor:
14+
receivers:
15+
otlp:
16+
protocols:
17+
grpc:
18+
19+
ingester:
20+
max_block_duration: 5m
21+
22+
compactor:
23+
compaction:
24+
block_retention: 15m
25+
26+
metrics_generator:
27+
registry:
28+
external_labels:
29+
source: tempo
30+
cluster: docker-compose
31+
storage:
32+
path: /tmp/tempo/generator/wal
33+
remote_write:
34+
- url: http://prometheus:9090/api/v1/write
35+
send_exemplars: true
36+
37+
storage:
38+
trace:
39+
backend: local
40+
wal:
41+
path: /tmp/tempo/wal
42+
local:
43+
path: /tmp/tempo/blocks
44+
45+
overrides:
46+
metrics_generator_processors: [service-graphs, span-metrics]

misc/opentelemetry/README.md

-62
This file was deleted.

misc/opentelemetry/mzcompose.py

-23
This file was deleted.

0 commit comments

Comments
 (0)