Skip to content

Commit 25b06fa

Browse files
authored
Add a script for updating files in the sample archive (#481)
* Add script for updating files in sample archive * Add info about sample archive update script to README
1 parent 51e4523 commit 25b06fa

File tree

2 files changed

+162
-7
lines changed

2 files changed

+162
-7
lines changed

README.md

+53-7
Original file line numberDiff line numberDiff line change
@@ -4,24 +4,32 @@ This cluster operator gathers anonymized system configuration and reports it to
44

55
# Table of Contents
66

7+
- [Insights Operator](#insights-operator)
8+
- [Table of Contents](#table-of-contents)
79
- [Building](#building)
810
- [Testing](#testing)
911
- [Documentation](#documentation)
1012
- [Getting metrics from Prometheus](#getting-metrics-from-prometheus)
11-
- [Generate the certificate and key](#generate-the-certificate-and-key)
12-
- [Prometheus metrics provided by Insights Operator](#prometheus-metrics-provided-by-insights-operator)
13-
- [Getting the data directly from Prometheus](#getting-the-data-directly-from-prometheus)
14-
- [Debugging Prometheus metrics without valid CA](#debugging-prometheus-metrics-without-valid-ca)
13+
- [Generate the certificate and key](#generate-the-certificate-and-key)
14+
- [Prometheus metrics provided by Insights Operator](#prometheus-metrics-provided-by-insights-operator)
15+
- [Running IO locally](#running-io-locally)
16+
- [Running IO on K8s](#running-io-on-k8s)
17+
- [Getting the data directly from Prometheus](#getting-the-data-directly-from-prometheus)
18+
- [Debugging Prometheus metrics without valid CA](#debugging-prometheus-metrics-without-valid-ca)
1519
- [Debugging](#debugging)
16-
- [Using the profiler](#using-the-profiler)
20+
- [Using the profiler](#using-the-profiler)
21+
- [Starting IO with the profiler](#starting-io-with-the-profiler)
22+
- [Collect profiling data](#collect-profiling-data)
23+
- [Analyzing profiling data](#analyzing-profiling-data)
1724
- [Changelog](#changelog)
18-
- [Updating the changelog](#updating-the-changelog)
25+
- [Updating the changelog](#updating-the-changelog)
1926
- [Reported data](#reported-data)
20-
- [Insights Operator Archive](#insights-operator-archive)
27+
- [Insights Operator Archive](#insights-operator-archive)
2128
- [Sample IO archive](#sample-io-archive)
2229
- [Generating a sample archive](#generating-a-sample-archive)
2330
- [Formatting archive json files](#formatting-archive-json-files)
2431
- [Obfuscating an archive](#obfuscating-an-archive)
32+
- [Updating the sample archive](#updating-the-sample-archive)
2533
- [Contributing](#contributing)
2634
- [Support](#support)
2735
- [License](#license)
@@ -251,6 +259,44 @@ go run ./cmd/obfuscate-archive/main.go YOUR_ARCHIVE.tar.gz
251259
where `YOUR_ARCHIVE.tar.gz` is the path to the archive.
252260
The obfuscated version will be created in the same directory and called `YOUR_ARCHIVE-obfuscated.tar.gz`
253261

262+
### Updating the sample archive
263+
264+
The `docs/insights-archive-sample/` directory contains an example of an Insights
265+
Operator archive, extracted and with pretty-formatted JSON files.
266+
In case of any changes that affect multiple files in the archive, it is a good
267+
idea to regenerate the sample archive to make sure it remains up-to-date.
268+
269+
There are two ways of updating the sample archive directory automatically.
270+
Both of them require running the Insights Operator, letting it generate an archive
271+
and extracting the archive into an otherwise empty directory.
272+
273+
The script will automatically replace existing files in the sample archive with
274+
their respective counterparts from the supplied extracted IO archive.
275+
In case of files with (partially) randomized names, such as pods or nodes,
276+
the entire directory is deleted and replaced with a matching directory from
277+
the new archive if possible.
278+
Changes made by the script can be checked and reverted using Git.
279+
The updated JSON files will be automatically pretty-formatted using `jq`,
280+
which is the only dependency required for running the script.
281+
282+
All existing files in the sample archive can be updated using the following command:
283+
284+
```sh
285+
./scripts/update_sample_archive.sh <Path of directory with the NEW extracted IO archive>
286+
```
287+
288+
If you only want to update files containing a certain string pattern,
289+
you can supply a regular expression as a second optional argument.
290+
For example, the following command was used to replace JSON files containing
291+
the `managedFields` field when it was removed from the IO archive to save space:
292+
293+
```sh
294+
./scripts/update_sample_archive.sh <Path of directory with the NEW extracted IO archive> '"managedFields":'
295+
```
296+
297+
The path of the sample archive directory should be constant relative to
298+
the path of the script and therefore does not have to be specified explicitly.
299+
254300
# Contributing
255301

256302
See [CONTRIBUTING](CONTRIBUTING.md) for workflow & convention details.

scripts/update_sample_archive.sh

+109
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,109 @@
1+
#!/bin/sh
2+
3+
# Please keep in mind that when the comments mention a "source archive",
4+
# they are referring to a directory containing an _extracted_ IO archive.
5+
6+
if [ -z "$1" ]; then
7+
>&2 echo "Usage: update_sample_archive.sh <Extracted Archive Source Directory> [JSON Content Filter]"
8+
exit
9+
fi
10+
11+
# This allows the JSON-finding function to read the filter from
12+
# a global variable instead of having to pass it as an argument.
13+
CONTENT_FILTER="$2"
14+
15+
# Get absolute path of the source IO archive.
16+
SOURCE_PREFIX=$(realpath "$1")/
17+
18+
# Get absolute path of the IO sample archive directory.
19+
SAMPLE_PREFIX=$(realpath "$(dirname "$0")/../docs/insights-archive-sample")/
20+
21+
# Escape dots and brackets (the most likely special characters found in paths)
22+
# with backslashes to prevent breaking the regular expressions.
23+
regexEscape() {
24+
echo "$1" | sed 's/[][)(}{\.]/\\\0/g'
25+
}
26+
27+
# Escaped version of the directory paths ready to be used in regular expressions.
28+
SOURCE_PREFIX_ESCAPED=$(regexEscape "$SOURCE_PREFIX")
29+
SAMPLE_PREFIX_ESCAPED=$(regexEscape "$SAMPLE_PREFIX")
30+
31+
jq_update_file() {
32+
source_file="$SOURCE_PREFIX$1"
33+
if [ ! -f "$source_file" ]; then
34+
>&2 echo "[WARN] Unable to update file '$1' (file not found in the source archive)"
35+
return 1
36+
fi
37+
38+
sample_file="$SAMPLE_PREFIX$1"
39+
mkdir -p "${sample_file%/*}"
40+
jq < "$source_file" > "$sample_file" || exit 1
41+
echo "[OK] $source_file --> $sample_file"
42+
}
43+
44+
jq_update_dir() {
45+
source_dir="$SOURCE_PREFIX$1"
46+
if [ ! -d "$source_dir" ]; then
47+
>&2 echo "[WARN] Unable to update directory '$1' (directory not found in the source archive)"
48+
return 1
49+
fi
50+
51+
sample_dir="$SAMPLE_PREFIX$1"
52+
# Delete the old JSON files.
53+
[ -d "$sample_dir" ] && find "$sample_dir" -name '*.json' -type f -delete
54+
# Copy and format JSON files from the source archive to the sample archive directory.
55+
find "$SOURCE_PREFIX$1" -name '*.json' | grep -oP "^${SOURCE_PREFIX_ESCAPED}\K.+" | sort | uniq | while read -r fname; do
56+
jq_update_file "$fname"
57+
done
58+
}
59+
60+
# Expression used when looking for unique directories containing found files.
61+
FIND_DIR_EXPR='/(?=[^/:]+'
62+
# Expression used when looking for all found files.
63+
FIND_FILE_EXPR='(?='
64+
65+
# If a content filter was provided, then all JSON files that match the filter in the existing sample archive directory are returned.
66+
# Otherwise, a complete list of JSON files in the existing sample archive directory structure is returned.
67+
# The first argument is used switch between returning a list of files and a list of unique directories containing said files.
68+
find_jsons() {
69+
if [ -z "$CONTENT_FILTER" ]; then
70+
# find "$SOURCE_PREFIX" -iname "*.json" | grep -oP "^${SOURCE_PREFIX_ESCAPED}\K[^:]+${1})" | sort | uniq
71+
find "$SAMPLE_PREFIX" -iname "*.json" | grep -oP "^${SAMPLE_PREFIX_ESCAPED}\K[^:]+${1})" | sort | uniq
72+
else
73+
grep -rn "$SAMPLE_PREFIX" --include \*.json -e "$CONTENT_FILTER" | grep -oP "^${SAMPLE_PREFIX_ESCAPED}\K[^:]+?${1}:)" | sort | uniq
74+
fi
75+
}
76+
77+
# Return value indicating if the specified directory is known to contain files with randomized names.
78+
# This function only checks the path prefix, which means that subdirectory/file paths can be checked as well.
79+
contains_randomized_names() {
80+
case "$1" in
81+
config/certificatesigningrequests/*|\
82+
config/hostsubnet/*|\
83+
config/machineconfigs/*|\
84+
config/node/*|\
85+
config/persistentvolumes/*|\
86+
config/pod/*|\
87+
machinesets/*)
88+
true
89+
;;
90+
91+
*)
92+
false
93+
;;
94+
esac
95+
}
96+
97+
# If one of the resources in a directory contains a filter hit, the whole directory must be updated
98+
# because some resource names are randomized and repeated sample archive updates would result in
99+
# size inflation of the sample archive (i.e., more and more pod resource JSONs with each archive update).
100+
# There is a list of directories which contain files with randomized names.
101+
# Remaning directories are handled on a file-by-file basis.
102+
find_jsons "$FIND_DIR_EXPR" | while read -r dir_name; do
103+
contains_randomized_names "$dir_name" && jq_update_dir "$dir_name"
104+
done
105+
106+
# This handles the remaining files after the entire directories of resources have already been updated.
107+
find_jsons "$FIND_FILE_EXPR" | while read -r file_name; do
108+
contains_randomized_names "$file_name" || jq_update_file "$file_name"
109+
done

0 commit comments

Comments
 (0)