Skip to content

Commit a31e046

Browse files
author
Serhii Zakharov
committed
implemented fetching gathering rules from a remote service
1 parent acf186f commit a31e046

File tree

17 files changed

+927
-343
lines changed

17 files changed

+927
-343
lines changed

Diff for: README.md

+32-29
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
# Insights Operator
22

3-
This cluster operator gathers anonymized system configuration and reports it to Red Hat Insights. It is a part of the
4-
standard OpenShift distribution. The data collected allows for debugging in the event of cluster failures or
3+
This cluster operator gathers anonymized system configuration and reports it to Red Hat Insights. It is a part of the
4+
standard OpenShift distribution. The data collected allows for debugging in the event of cluster failures or
55
unanticipated errors.
66

77
# Table of Contents
@@ -60,7 +60,7 @@ Unit tests can be started by the following command:
6060
make test
6161
```
6262

63-
It is also possible to specify CLI options for Go test. For example, if you need to disable test results caching,
63+
It is also possible to specify CLI options for Go test. For example, if you need to disable test results caching,
6464
use the following command:
6565

6666
```shell script
@@ -72,8 +72,8 @@ VERBOSE=-count=1 make test
7272
# Documentation
7373

7474

75-
The document [docs/gathered-data](docs/gathered-data.md) contains the list of collected data and the API that is used
76-
to collect it. This documentation is generated by the command bellow, by collecting the comment tags located above
75+
The document [docs/gathered-data](docs/gathered-data.md) contains the list of collected data and the API that is used
76+
to collect it. This documentation is generated by the command bellow, by collecting the comment tags located above
7777
each Gather method.
7878

7979
To start generating the document run:
@@ -86,12 +86,12 @@ make docs
8686

8787
## Generate the certificate and key
8888

89-
Certificate and key are required to access Prometheus metrics (instead 404 Forbidden is returned). It is possible
90-
to generate these two files from Kubernetes config file. Certificate is stored in `users/admin/client-cerfificate-data`
91-
and key in `users/admin/client-key-data`. Please note that these values are encoded by using Base64 encoding,
89+
Certificate and key are required to access Prometheus metrics (instead 404 Forbidden is returned). It is possible
90+
to generate these two files from Kubernetes config file. Certificate is stored in `users/admin/client-cerfificate-data`
91+
and key in `users/admin/client-key-data`. Please note that these values are encoded by using Base64 encoding,
9292
so it is needed to decode them, for example by `base64 -d`.
9393

94-
There's a tool named `gen_cert_key.py` that can be used to automatically generate both files. It is stored in `tools`
94+
There's a tool named `gen_cert_key.py` that can be used to automatically generate both files. It is stored in `tools`
9595
subdirectory.
9696

9797
```shell script
@@ -100,10 +100,10 @@ gen_cert_file.py kubeconfig.yaml
100100

101101
## Prometheus metrics provided by Insights Operator
102102

103-
It is possible to read Prometheus metrics provided by Insights Operator. Example of metrics exposed by
103+
It is possible to read Prometheus metrics provided by Insights Operator. Example of metrics exposed by
104104
Insights Operator can be found at [metrics.txt](docs/metrics.txt)
105105

106-
Depending on how or where the IO is running you may have different ways to retrieve the metrics.
106+
Depending on how or where the IO is running you may have different ways to retrieve the metrics.
107107
Here is a list of some options, so you can find the one that fits you:
108108

109109
### Running IO locally
@@ -140,22 +140,20 @@ curl --cert k8s.crt --key k8s.key -k 'https://prometheus-k8s.openshift-monitori
140140

141141
## Debugging Prometheus metrics without valid CA
142142

143-
Get the token
143+
1. Forward the service
144144

145-
```shell script
146-
oc sa get-token prometheus-k8s -n openshift-monitoring
145+
```bash
146+
sudo kubefwd svc -n openshift-monitoring -d openshift-monitoring.svc -l prometheus=k8s
147147
```
148148

149-
Change in `pkg/controller/operator.go` after creating `metricsGatherKubeConfig` (about line #86)
149+
2. Set `INSECURE_PROMETHEUS_TOKEN` environment variable:
150150

151-
```go
152-
metricsGatherKubeConfig.Insecure = true
153-
metricsGatherKubeConfig.BearerToken = "YOUR-TOKEN-HERE"
154-
# by default CAFile is /var/run/secrets/kubernetes.io/serviceaccount/service-ca.crt
155-
metricsGatherKubeConfig.CAFile = ""
156-
metricsGatherKubeConfig.CAData = []byte{}
151+
```bash
152+
export INSECURE_PROMETHEUS_TOKEN=$(oc sa get-token prometheus-k8s -n openshift-monitoring)
157153
```
158154

155+
3. Run the operator.
156+
159157
# Debugging
160158

161159
## Using the profiler
@@ -185,7 +183,7 @@ go tool pprof http://localhost:6060/debug/pprof/profile?seconds=30
185183
go tool pprof http://localhost:6060/debug/pprof/heap
186184
```
187185

188-
These commands will create a compressed file that can be visualized using a variety of tools, one of them is
186+
These commands will create a compressed file that can be visualized using a variety of tools, one of them is
189187
the `pprof` tool.
190188

191189
### Analyzing profiling data
@@ -213,7 +211,7 @@ It uses both the local git and GitHub`s API to update the file so:
213211

214212
It can be used 2 ways:
215213

216-
1. Providing no command line arguments the script will update the current `CHANGELOG.md` with the latest changes
214+
1. Providing no command line arguments the script will update the current `CHANGELOG.md` with the latest changes
217215
2. according to the local git state.
218216

219217
> 🚨 IMPORTANT: It will only work with changelogs created with this script
@@ -222,7 +220,7 @@ It can be used 2 ways:
222220
go run cmd/changelog/main.go
223221
```
224222

225-
2. Providing 2 command line arguments, `AFTER` and `UNTIL` dates the script will generate a new `CHANGELOG.md` within
223+
2. Providing 2 command line arguments, `AFTER` and `UNTIL` dates the script will generate a new `CHANGELOG.md` within
226224
the provided time frame.
227225

228226
```shell script
@@ -235,17 +233,17 @@ go run cmd/changelog/main.go 2021-01-10 2021-01-20
235233
* ClusterOperator objects
236234
* All non-secret global config (hostnames and URLs anonymized)
237235

238-
The list of all collected data with description, location in produced archive and link to Api and some examples is
236+
The list of all collected data with description, location in produced archive and link to Api and some examples is
239237
at [docs/gathered-data.md](docs/gathered-data.md)
240238

241-
The resulting data is packed in `.tar.gz` archive with folder structure indicated in the document. Example of such
239+
The resulting data is packed in `.tar.gz` archive with folder structure indicated in the document. Example of such
242240
archive is at [docs/insights-archive-sample](docs/insights-archive-sample).
243241

244242
## Insights Operator Archive
245243

246244
### Sample IO archive
247245

248-
There is a sample IO archive maintained in this repo to use as a quick reference. (can be found
246+
There is a sample IO archive maintained in this repo to use as a quick reference. (can be found
249247
at [docs/insights-archive-sample](https://github.com/openshift/insights-operator/tree/master/docs/insights-archive-sample))
250248

251249
To keep it up-to-date it is **required** to update this manually when developing a new data enhancement.
@@ -311,8 +309,13 @@ the `managedFields` field when it was removed from the IO archive to save space:
311309
./scripts/update_sample_archive.sh <Path of directory with the NEW extracted IO archive> '"managedFields":'
312310
```
313311

314-
The path of the sample archive directory should be constant relative to
315-
the path of the script and therefore does not have to be specified explicitly.
312+
The path of the sample archive directory should be constant relative to the path of the script and therefore does not
313+
have to be specified explicitly.
314+
315+
# Conditional Gathering
316+
317+
See [docs/conditional-gatherer/README.md](https://github.com/openshift/insights-operator/blob/master/docs/conditional-gatherer/README.md)
318+
316319

317320
# Contributing
318321

Diff for: config/local.yaml

+1
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,7 @@ leaderElection:
55
interval: "5m"
66
storagePath: /tmp/insights-operator
77
endpoint: http://[::1]:8081
8+
conditionalGathererEndpoint: https://console.redhat.com/api/gathering/gathering_rules
89
impersonate: system:serviceaccount:openshift-insights:gather
910
gather:
1011
- ALL

Diff for: config/pod.yaml

+1
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,7 @@ leaderElection:
55
interval: "2h"
66
storagePath: /var/lib/insights-operator
77
endpoint: https://cloud.redhat.com/api/ingress/v1/upload
8+
conditionalGathererEndpoint: https://console.redhat.com/api/gathering/gathering_rules
89
impersonate: system:serviceaccount:openshift-insights:gather
910
pull_report:
1011
endpoint: https://cloud.redhat.com/api/insights-results-aggregator/v1/clusters/%s/report

Diff for: docs/conditional-gatherer/README.md

+123
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,123 @@
1+
# Conditional Gatherer
2+
3+
Conditional gatherer is a special gatherer which uses a set of rules describing which gathering functions to activate.
4+
More details can be found in `pkg/gatherers/conditional/conditional_gatherer.go`.
5+
6+
## Manual Testing
7+
8+
To test that conditional gatherer provides some data, follow the next steps:
9+
10+
1. Downscale CVO:
11+
```bash
12+
oc scale deployment -n openshift-cluster-version cluster-version-operator --replicas=0
13+
```
14+
15+
2. Backup prometheus rules:
16+
```bash
17+
oc get prometheusrule -n openshift-cluster-samples-operator samples-operator-alerts -o json > prometheus-rules.back.json
18+
```
19+
20+
3. Make SamplesImagestreamImportFailing alert to fire by setting `SamplesImagestreamImportFailing`'s
21+
`expr` value to `1 > bool 0` and `for` to `1s`:
22+
```bash
23+
echo '{
24+
"apiVersion": "monitoring.coreos.com/v1",
25+
"kind": "PrometheusRule",
26+
"metadata": {
27+
"name": "samples-operator-alerts",
28+
"namespace": "openshift-cluster-samples-operator"
29+
},
30+
"spec": {
31+
"groups": [
32+
{
33+
"name": "SamplesOperator",
34+
"rules": [
35+
{
36+
"alert": "SamplesImagestreamImportFailing",
37+
"annotations": {
38+
"message": "Always firing"
39+
},
40+
"expr": "1 > bool 0",
41+
"for": "1s",
42+
"labels": {
43+
"severity": "warning"
44+
}
45+
}
46+
]
47+
}
48+
]
49+
}
50+
}' | oc apply -f -
51+
```
52+
53+
4. Wait for the alert to fire:
54+
```
55+
export ALERT_MANAGER_HOST=(oc get route alertmanager-main -n openshift-monitoring -o jsonpath='{@.spec.host}')
56+
export INSECURE_PROMETHEUS_TOKEN=(oc sa get-token prometheus-k8s -n openshift-monitoring)
57+
curl -k -H "Authorization: Bearer $INSECURE_PROMETHEUS_TOKEN" https://$ALERT_MANAGER_HOST/api/v1/alerts | \
58+
jq '.data[] | select(.labels.alertname == "SamplesImagestreamImportFailing")'
59+
```
60+
61+
5. Make metrics work by forwarding the endpoint and setting INSECURE_PROMETHEUS_TOKEN environment variable:
62+
```bash
63+
export INSECURE_PROMETHEUS_TOKEN=(oc sa get-token prometheus-k8s -n openshift-monitoring)
64+
```
65+
```bash
66+
# run this command in a separate terminal
67+
sudo kubefwd svc -n openshift-monitoring -d openshift-monitoring.svc -l prometheus=k8s --kubeconfig $KUBECONFIG
68+
```
69+
70+
6. Run the operator and wait for an archive containing `conditional/` directory.
71+
72+
7. Restore the backup:
73+
```bash
74+
oc apply -f prometheus-rules.back.json
75+
```
76+
77+
8. Fix CVO back
78+
```bash
79+
oc scale deployment -n openshift-cluster-version cluster-version-operator --replicas=1
80+
```
81+
82+
## Using Locally Started Service
83+
84+
1. Run the service following the instructions here
85+
https://github.com/RedHatInsights/insights-operator-gathering-conditions-service
86+
2. Set `conditionalGathererEndpoint` in `config/local.yaml` to `http://localhost:8081/api/gathering/gathering_rules`
87+
3. Enjoy your conditional rules from the local service
88+
89+
## Using a Mock Server
90+
91+
1. Start a mock server:
92+
```bash
93+
git clone https://github.com/RedHatInsights/insights-operator-gathering-conditions.git
94+
cd insights-operator-gathering-conditions/
95+
./build.sh
96+
python3 -m http.server --directory build/
97+
```
98+
99+
2. Set `conditionalGathererEndpoint` in `config/local.yaml` to `http://localhost:8000/rules.json`
100+
3. Enjoy your conditional rules from the mock service
101+
102+
## Using Stage Endpoint
103+
104+
0. Be connected to Red Hat network or configure a proxy for stage version of console.redhat.com
105+
1. Set up the stage endpoint in `config/local.yaml`
106+
2. Configure authentication through support secret
107+
```bash
108+
echo '{
109+
"apiVersion": "v1",
110+
"kind": "Secret",
111+
"metadata": {
112+
"namespace": "openshift-config",
113+
"name": "support"
114+
},
115+
"type": "Opaque",
116+
"data": {
117+
"username": "'(echo $STAGE_USERNAME | base64 --wrap=0)'",
118+
"password": "'(echo $STAGE_PASSWORD | base64 --wrap=0)'"
119+
}
120+
}' | oc apply -f -
121+
```
122+
123+
3. Enjoy your conditional rules from the stage endpoint

Diff for: pkg/config/config.go

+27-14
Original file line numberDiff line numberDiff line change
@@ -11,11 +11,12 @@ import (
1111

1212
// Serialized defines the standard config for this operator.
1313
type Serialized struct {
14-
Report bool `json:"report"`
15-
StoragePath string `json:"storagePath"`
16-
Interval string `json:"interval"`
17-
Endpoint string `json:"endpoint"`
18-
PullReport struct {
14+
Report bool `json:"report"`
15+
StoragePath string `json:"storagePath"`
16+
Interval string `json:"interval"`
17+
Endpoint string `json:"endpoint"`
18+
ConditionalGathererEndpoint string `json:"conditionalGathererEndpoint"`
19+
PullReport struct {
1920
Endpoint string `json:"endpoint"`
2021
Delay string `json:"delay"`
2122
Timeout string `json:"timeout"`
@@ -33,15 +34,16 @@ type Serialized struct {
3334

3435
// Controller defines the standard config for this operator.
3536
type Controller struct {
36-
Report bool
37-
StoragePath string
38-
Interval time.Duration
39-
Endpoint string
40-
ReportEndpoint string
41-
ReportPullingDelay time.Duration
42-
ReportMinRetryTime time.Duration
43-
ReportPullingTimeout time.Duration
44-
Impersonate string
37+
Report bool
38+
StoragePath string
39+
Interval time.Duration
40+
Endpoint string
41+
ConditionalGathererEndpoint string
42+
ReportEndpoint string
43+
ReportPullingDelay time.Duration
44+
ReportMinRetryTime time.Duration
45+
ReportPullingTimeout time.Duration
46+
Impersonate string
4547
// list of gathering functions to enable
4648
// if there's a string "ALL", we enable everything
4749
// otherwise, each string should consist of 2 parts:
@@ -84,6 +86,7 @@ type Converter func(s *Serialized, cfg *Controller) (*Controller, error)
8486
func (c *Controller) ToString() string {
8587
return fmt.Sprintf("enabled=%t "+
8688
"endpoint=%s "+
89+
"conditional_gatherer_endpoint=%s "+
8790
"interval=%s "+
8891
"username=%t "+
8992
"token=%t "+
@@ -93,6 +96,7 @@ func (c *Controller) ToString() string {
9396
"pollingTimeout=%s",
9497
c.Report,
9598
c.Endpoint,
99+
c.ConditionalGathererEndpoint,
96100
c.Interval,
97101
len(c.Username) > 0,
98102
len(c.Token) > 0,
@@ -106,6 +110,7 @@ func (c *Controller) MergeWith(cfg *Controller) {
106110
c.mergeCredentials(cfg)
107111
c.mergeInterval(cfg)
108112
c.mergeEndpoint(cfg)
113+
c.mergeConditionalGathererEndpoint(cfg)
109114
c.mergeReport(cfg)
110115
c.mergeOCM(cfg)
111116
c.mergeHTTP(cfg)
@@ -122,6 +127,12 @@ func (c *Controller) mergeEndpoint(cfg *Controller) {
122127
}
123128
}
124129

130+
func (c *Controller) mergeConditionalGathererEndpoint(cfg *Controller) {
131+
if len(cfg.ConditionalGathererEndpoint) > 0 {
132+
c.ConditionalGathererEndpoint = cfg.ConditionalGathererEndpoint
133+
}
134+
}
135+
125136
func (c *Controller) mergeReport(cfg *Controller) {
126137
if len(cfg.ReportEndpoint) > 0 {
127138
c.ReportEndpoint = cfg.ReportEndpoint
@@ -168,6 +179,7 @@ func ToController(s *Serialized, cfg *Controller) (*Controller, error) { // noli
168179
cfg.Report = s.Report
169180
cfg.StoragePath = s.StoragePath
170181
cfg.Endpoint = s.Endpoint
182+
cfg.ConditionalGathererEndpoint = s.ConditionalGathererEndpoint
171183
cfg.Impersonate = s.Impersonate
172184
cfg.Gather = s.Gather
173185
cfg.EnableGlobalObfuscation = s.EnableGlobalObfuscation
@@ -254,6 +266,7 @@ func ToDisconnectedController(s *Serialized, cfg *Controller) (*Controller, erro
254266
cfg.Impersonate = s.Impersonate
255267
cfg.Gather = s.Gather
256268
cfg.EnableGlobalObfuscation = s.EnableGlobalObfuscation
269+
cfg.ConditionalGathererEndpoint = s.ConditionalGathererEndpoint
257270

258271
if len(s.Interval) > 0 {
259272
d, err := time.ParseDuration(s.Interval)

0 commit comments

Comments
 (0)