Skip to content

Commit fc3cd22

Browse files
committed
chore, test: test driver improvements (#3293)
* test: various improvements to the test/test.js driver
1 parent d38bfda commit fc3cd22

11 files changed

+230
-316
lines changed

DEVELOPMENT.md

Lines changed: 30 additions & 97 deletions
Original file line numberDiff line numberDiff line change
@@ -61,90 +61,44 @@ index 94376188..571539aa 100644
6161

6262
# Testing tips
6363

64-
## How to show the slowest TAV tests from a Jenkins build
65-
66-
Jenkins builds of the agent produce a "steps-info.json" artifact that gives
67-
execution time of each of the build steps. For a build that ran the TAV tests
68-
we can list the slowest ones via:
69-
70-
```
71-
npm install -g json # Re-writing this to use jq is an exercise for the reader.
72-
73-
curl -s https://apm-ci.elastic.co/job/apm-agent-nodejs/job/apm-agent-nodejs-mbp/.../artifact/steps-info.json \
74-
| json -c 'this.displayName==="Run Tests"' -ga durationInMillis state result displayDescription | sort -n -k1 | tail -20
75-
```
76-
77-
For example:
78-
79-
```
80-
% curl -s https://apm-ci.elastic.co/job/apm-agent-nodejs/job/apm-agent-nodejs-mbp/job/main/903/artifact/steps-info.json \
81-
| json -c 'this.displayName==="Run Tests"' -ga durationInMillis state result displayDescription | sort -n -k1 | tail -10
82-
1940297 FINISHED SUCCESS .ci/scripts/test.sh "14" "fastify" "false"
83-
2434461 FINISHED SUCCESS .ci/scripts/test.sh "15" "apollo-server-express" "false"
84-
2867593 FINISHED SUCCESS .ci/scripts/test.sh "16" "apollo-server-express" "false"
85-
3232404 FINISHED SUCCESS .ci/scripts/test.sh "8" "pg" "false"
86-
3233514 FINISHED SUCCESS .ci/scripts/test.sh "12" "pg" "false"
87-
3371890 FINISHED SUCCESS .ci/scripts/test.sh "10" "pg" "false"
88-
5394174 FINISHED SUCCESS .ci/scripts/test.sh "12" "apollo-server-express" "false"
89-
5832066 FINISHED SUCCESS .ci/scripts/test.sh "10" "apollo-server-express" "false"
90-
6481178 FINISHED SUCCESS .ci/scripts/test.sh "14" "apollo-server-express" "false"
91-
6626799 FINISHED SUCCESS .ci/scripts/test.sh "8" "apollo-server-express" "false"
92-
```
93-
94-
## How to troubleshoot `Container "$containerId" is unhealthy.` errors
95-
96-
Each "Test" step of a Jenkins CI build uses `docker-compose` to start services
97-
for testing, and then runs tests in a `node_tests` container. Starting those
98-
services can fail with the following unhelpful message in the logs:
99-
100-
```
101-
[2022-09-19T05:55:43.897Z] .ci/scripts/test.sh:250: main(): docker-compose --no-ansi --log-level ERROR -f .ci/docker/docker-compose-all.yml up --exit-code-from node_tests --remove-orphans --abort-on-container-exit node_tests
102-
...
103-
[2022-09-19T05:56:23.776Z] ERROR: for node_tests Container "2d979b0c797d" is unhealthy.
64+
## How to show the slowest TAV tests from a CI run
65+
66+
The [TAV tests](./TESTING.md#tav-tests) run a large test matrix, where each
67+
step can take a long time (installing and testing a large number of module
68+
versions). Part of maintaining this is to look at particularly slow steps
69+
as candidates for speeding up.
70+
71+
./dev-utils/ci-tav-slow-jobs.sh
72+
73+
This script will list all the TAV test steps, from the latest run, with the
74+
slowest last. For example:
75+
76+
```sh
77+
% ./dev-utils/ci-tav-slow-jobs.sh | tail
78+
1256 20m56s test-tav (14, next)
79+
1307 21m47s test-tav (12, knex)
80+
1307 21m47s test-tav (14, knex)
81+
1323 22m03s test-tav (10, pg)
82+
1386 23m06s test-tav (12, graphql)
83+
1431 23m51s test-tav (14, tedious)
84+
1496 24m56s test-tav (10, graphql)
85+
1508 25m08s test-tav (8, graphql)
86+
1757 29m17s test-tav (14, graphql)
87+
1794 29m54s test-tav (10, knex)
10488
```
10589

106-
That container ID does not identify *which* of the many service containers is
107-
the one to fail. Two ways to troubleshoot this are as follows.
10890

109-
First, the Jenkins build will include log files of both Docker container logs
110-
and Docker events as Jenkins build artifacts, if the "Test" step failed. These
111-
are collected by filebeat and metricbeat (as configured by the `dockerContext()`
112-
block in ".ci/Jenkinsfile"). Here is an example querying the metricbeat log of
113-
docker events for containers that are failing their healthcheck. This uses
114-
[ecslog](https://github.com/trentm/go-ecslog) to filter and format the log file.
91+
## Reproducing CI test failures locally
11592

116-
```
117-
$ ecslog -k 'docker.healthcheck.failingstreak > 0' -i container.image.name,docker.healthcheck docker-16-release-metricbeat.log-20220927.ndjson
118-
...
119-
[2022-09-27T14:55:53.857Z] (on apm-ci-immutable-ubuntu-1804-1664290081867003348):
120-
container: {
121-
"image": {
122-
"name": "mongo:6"
123-
}
124-
}
125-
docker: {
126-
"healthcheck": {
127-
"status": "unhealthy",
128-
"failingstreak": 49,
129-
"event": {
130-
"start_date": "2022-09-27T14:55:53.012Z",
131-
"end_date": "2022-09-27T14:55:53.153Z",
132-
"exit_code": -1,
133-
"output": "OCI runtime exec failed: exec failed: unable to start container process: exec: \"mongo\": executable file not found in $PATH: unknown"
134-
}
135-
}
136-
}
137-
```
138-
139-
Second, most of the time you should be able to reproduce a "Test" step failure
140-
locally. Sometimes this requires forcing an update to the latest Docker image
141-
for some services.
93+
Most of the time you should be able to reproduce a CI test step failure locally.
94+
Sometimes this requires forcing an update to the latest Docker image for some
95+
services.
14296

14397
```
14498
$ docker system prune --all --force --volumes # heavy-handed purge of all local Docker data
14599
...
146100
147-
$ .ci/scripts/test.sh -b "release" -t "" "16" # or a different value for "16" depending which stage failed
101+
$ .ci/scripts/test.sh -b "release" "16" # or a different value for "16" depending which stage failed
148102
...
149103
```
150104

@@ -191,7 +145,7 @@ Two maintenance tasks are (a) to keep these three places in sync and (b) to
191145
know when support for newer versions of module needs to be added. The latter
192146
is partially handled by automated dependabot PRs (see ".github/dependabot.yml").
193147
Both tasks are also partially supported by the **`./dev-utils/bitrot.js`** tool.
194-
It will list inconsistences between ".tav.yaml" and
148+
It will list inconsistences between ".tav.yml" and
195149
"supported-technologies.asciidoc", and will note newer releases of a module
196150
that isn't covered. For example, redis@5 is not covered by the ranges above,
197151
so the tool looks like this:
@@ -204,27 +158,6 @@ redis bitrot: latest [email protected] (released 2022-09-06): is not in .tav.yml range
204158
205159
# Other tips
206160
207-
## How to trigger a benchmark run for a PR
208-
209-
1. Go to the [apm-ci list of apm-agent-nodejs PRs](https://apm-ci.elastic.co/job/apm-agent-nodejs/job/apm-agent-nodejs-mbp/view/change-requests/) and click on your PR.
210-
2. Click "Build with Parameters" in the left sidebar. (If you don't have "Build with Parameters" then you aren't logged in.)
211-
3. Select these options to (mostly) *only* run the ["Benchmarks" step](https://github.com/elastic/apm-agent-nodejs/blob/v3.14.0/.ci/Jenkinsfile#L311-L330):
212-
- [x] Run\_As\_Main\_Branch
213-
- [x] bench\_ci
214-
- [ ] tav\_ci
215-
- [ ] tests\_ci
216-
- [ ] test\_edge\_ci
217-
218-
Limitation: The current dashboard for benchmark results only shows datapoints
219-
from the "main" branch. It would be useful to have a separate chart that
220-
showed PR values.
221-
222-
(Another way to start the "Benchmarks" step is via a GitHub comment
223-
"run benchmark tests". However, that also triggers the "Test" step
224-
and, depending on other conditions, the "TAV Test" step -- both of which are
225-
long and will run before getting to the Benchmarks run.)
226-
227-
228161
## How to test your local agent in Docker
229162
230163
If you are developing on macOS, it can be convenient to test your local

0 commit comments

Comments
 (0)