Skip to content

Commit 19f4d00

Browse files
committed
qemu: expose rr
run: expose forgotten -Q, document it
1 parent 2e42a77 commit 19f4d00

File tree

5 files changed

+104
-14
lines changed

5 files changed

+104
-14
lines changed

README.adoc

+58-7
Original file line numberDiff line numberDiff line change
@@ -7011,19 +7011,70 @@ TODO do even more awesome offline post-mortem analysis things, such as:
70117011

70127012
==== QEMU record and replay
70137013

7014-
QEMU supports deterministic record and replay by saving external inputs, which would be awesome to understand the kernel, as you would be able to examine a single run as many times as you would like.
7014+
QEMU runs are not deterministic by default, however it does support a record and replay mechanism that allows you to replay a previous run deterministically:
70157015

7016-
This mechanism first requires a trace to be generated on an initial record run. The trace is then used on the replay runs to make them deterministic.
7016+
This awesome feature allows you to examine a single run as many times as you would like until you understand everything:
70177017

7018-
Unfortunately it is not working in the current QEMU: https://stackoverflow.com/questions/46970215/how-to-use-qemus-deterministic-record-and-replay-feature-for-a-linux-kernel-boo
7018+
....
7019+
# Record a run.
7020+
./run -F '/rand_check.out;/poweroff.out;' -r
7021+
# Replay the run.
7022+
./run -F '/rand_check.out;/poweroff.out;' -R
7023+
....
7024+
7025+
By comparing the terminal output of both runs, we can see that they are the exact same, including things which normally differ across runs:
7026+
7027+
* timestamps of dmesg output
7028+
* <<rand_check-out>> output
70197029

7020-
Patches were merged in post v2.12.0-rc2 but it crashed for me and I opened a minimized bug report: https://bugs.launchpad.net/qemu/+bug/1762179
7030+
The record and replay feature was revived around QEMU v3.0.0. It existed earlier but it rot completely. As of v3.0.0 it is still flaky: sometimes we get deadlocks, and only a limited number of command line arguments are supported.
70217031

7022-
We don't expose record and replay on our scripts yet since it was was not very stable, but we will do so when it stabilizes.
7032+
Documented at: https://github.com/qemu/qemu/blob/v2.12.0/docs/replay.txt
7033+
7034+
TODO: using `-r` as above leads to a kernel warning:
7035+
7036+
....
7037+
rcu_sched detected stalls on CPUs/tasks
7038+
....
70237039

7024-
<<rand_check-out>> is a good way to test out if record and replay is actually deterministic.
7040+
TODO: replay deadlocks intermittently at disk operations, last kernel message:
7041+
7042+
....
7043+
EXT4-fs (sda): re-mounted. Opts: block_validity,barrier,user_xattr
7044+
....
7045+
7046+
TODO replay with network gets stuck:
7047+
7048+
....
7049+
./run -F '/sbin/ifup -a;wget -S google.com;/poweroff.out;' -r
7050+
./run -F '/sbin/ifup -a;wget -S google.com;/poweroff.out;' -R
7051+
....
7052+
7053+
after the message:
7054+
7055+
....
7056+
adding dns 10.0.2.3
7057+
....
7058+
7059+
There is explicit network support on the QEMU patches, but either it is buggy or we are not using the correct magic options.
7060+
7061+
TODO `arm` and `aarch64` only seem to work with initrd since I cannot plug a working IDE disk device? See also: https://lists.gnu.org/archive/html/qemu-devel/2018-02/msg05245.html
7062+
7063+
Then, when I tried with <<initrd>> and no disk:
7064+
7065+
....
7066+
./build -aA -i
7067+
./run -aA -F '/rand_check.out;/poweroff.out;' -i -r
7068+
./run -aA -F '/rand_check.out;/poweroff.out;' -i -R
7069+
....
7070+
7071+
QEMU crashes with:
7072+
7073+
....
7074+
ERROR:replay/replay-time.c:49:replay_read_clock: assertion failed: (replay_file && replay_mutex_locked())
7075+
....
70257076

7026-
Alternatively, https://github.com/mozilla/rr[`mozilla/rr`] claims it is able to run QEMU: but using it would require you to step through QEMU code itself. Likely doable, but do you really want to?
7077+
I had the same error previously on x86-64, but it was fixed: https://bugs.launchpad.net/qemu/+bug/1762179 so maybe the forgot to fix it for `aarch64`?
70277078

70287079
==== QEMU trace multicore
70297080

build-usage.adoc

+1
Original file line numberDiff line numberDiff line change
@@ -35,6 +35,7 @@
3535
on top of it.
3636
|`-M` |`VARIANT` |gem5 build variant.
3737
|`-p` | |Pass extra arguments to the `rootfs_post_build_script`.
38+
|`-Q` |`VARIANT`` |QEMU build variant.
3839
|`-S` | |Don't build QEMU with SDL support.
3940
Graphics such as X11 won't work, only the terminal.
4041
|`-s` | |Add a custom suffix to the build.

common

+2-1
Original file line numberDiff line numberDiff line change
@@ -69,7 +69,8 @@ set_common_vars() {
6969
common_images_dir="${buildroot_out_dir}/images"
7070
host_dir="${buildroot_out_dir}/host"
7171
common_qemu_run_dir="${out_arch_dir}/qemu/${common_run_id}"
72-
common_qemu_termout_file="${common_qemu_run_dir}/termout.txt"
72+
common_qemu_termout_file="${common_qemu_run_dir}/termout.txt"
73+
common_qemu_rrfile="${common_qemu_run_dir}/rrfile"
7374
common_linux_custom_dir="${build_dir}/linux-custom"
7475
common_linux_variant_dir="${common_linux_custom_dir}.${linux_variant}"
7576
common_qemu_custom_dir="${build_dir}/host-qemu-custom"

run

+40-6
Original file line numberDiff line numberDiff line change
@@ -27,6 +27,7 @@ initramfs=false
2727
memory=256M
2828
nographic=true
2929
prebuilt=false
30+
rr=
3031
root=
3132
tmux=false
3233
tmux_args=
@@ -35,7 +36,7 @@ trace_enabled=false
3536
# just to prevent QEMU from emitting a warning that '' is not valid.
3637
trace_type=pr_manager_run
3738
vnc=
38-
while getopts a:c:DdE:e:F:f:G:ghIiKkL:M:m:N:n:PT:t:U:uVX:x OPT; do
39+
while getopts a:c:DdE:e:F:f:G:ghIiKkL:M:m:N:n:PQ:RrT:t:U:uVX:x OPT; do
3940
case "$OPT" in
4041
a)
4142
arch="$OPTARG"
@@ -108,6 +109,15 @@ while getopts a:c:DdE:e:F:f:G:ghIiKkL:M:m:N:n:PT:t:U:uVX:x OPT; do
108109
P)
109110
prebuilt=true
110111
;;
112+
Q)
113+
common_qemu_variant="$OPTARG"
114+
;;
115+
R)
116+
rr=replay
117+
;;
118+
r)
119+
rr=record
120+
;;
111121
T)
112122
trace_enabled=true
113123
trace_type="$OPTARG"
@@ -209,7 +219,7 @@ ${gem5opts} \
209219
--dtb "${common_gem5_system_dir}/arm/dt/armv8_gem5_v1_big_little_2_2.dtb" \\
210220
--kernel="${common_vmlinux}" \\
211221
--little-cpus=2 \\
212-
${extra_flags} \\
222+
${extra_flags} \
213223
"
214224
else
215225
gem5_common="\
@@ -282,16 +292,39 @@ ${vnc}"
282292
extra_flags="${extra_flags} -initrd '${common_images_dir}/rootfs.cpio' \\
283293
"
284294
fi
295+
296+
# Disk related options.
285297
if "$ramfs"; then
286298
# TODO why is this needed, and why any string works.
287299
root='root=/dev/anything'
288300
else
289301
if [ ! "$arch" = mips64 ]; then
290-
extra_flags="${extra_flags} -drive 'file=${common_images_dir}/rootfs.ext2.qcow2,format=qcow2,if=virtio,snapshot' \\
302+
if [ -n "$rr" ]; then
303+
driveif=none
304+
rrid=',id=img-direct'
305+
root='root=/dev/sda'
306+
else
307+
driveif=virtio
308+
root='root=/dev/vda'
309+
rrid=
310+
fi
311+
extra_flags="${extra_flags} -drive 'file=${common_images_dir}/rootfs.ext2.qcow2,format=qcow2,if=${driveif},snapshot${rrid}' \\
312+
"
313+
if [ -n "$rr" ]; then
314+
extra_flags="${extra_flags} \\
315+
-drive driver=blkreplay,if=none,image=img-direct,id=img-blkreplay \\
316+
-device ide-hd,drive=img-blkreplay \\
291317
"
292-
root='root=/dev/vda'
318+
fi
293319
fi
294320
fi
321+
322+
if [ -n "$rr" ]; then
323+
extra_flags="${extra_flags} \
324+
-object filter-replay,id=replay,netdev=net0 \\
325+
-icount 'shift=7,rr=${rr},rrfile=${common_qemu_rrfile}' \\
326+
"
327+
fi
295328
case "$arch" in
296329
x86_64)
297330
if "$kgdb"; then
@@ -342,7 +375,8 @@ ${extra_flags} \
342375
mips64)
343376
if ! "$ramfs"; then
344377
root='root=/dev/hda'
345-
extra_flags="${extra_flags} -drive 'file=${common_images_dir}/rootfs.ext2.qcow2,format=qcow2,snapshot' \\
378+
extra_flags="${extra_flags} \
379+
-drive 'file=${common_images_dir}/rootfs.ext2.qcow2,format=qcow2,snapshot' \\
346380
"
347381
fi
348382
cmd="\
@@ -359,7 +393,7 @@ fi
359393
if "$tmux"; then
360394
if "$gem5"; then
361395
eval "./tmu 'sleep 2;./gem5-shell -n ${common_run_id} ${tmux_args};'"
362-
elif "$debug"; then
396+
elif "$debug"; then
363397
eval "./tmu ./rungdb -a '${arch} -L ${common_linux_variant}' -n ${common_run_id} ${tmux_args}"
364398
fi
365399
fi

run-usage.adoc

+3
Original file line numberDiff line numberDiff line change
@@ -44,7 +44,10 @@
4444
Any
4545
|`-N` |`VARIANT` |gem5 source input variant.
4646
|`-n` | |Run ID.
47+
|`-R` | |Replay a QEMU run record deterministically.
48+
|`-r` | |Record a QEMU run record for later replay with `-R`.
4749
|`-P` | |Run the downloaded prebuilt images.
50+
|`-Q` |`VARIANT`` |QEMU build variant.
4851
|`-T` |`TRACE_TYPES` |Set trace events to be enabled.
4952
If not given, gem5 tracing is completely disabled, while QEMU tracing
5053
is enabled but uses default traces that are very rare and don't affect

0 commit comments

Comments
 (0)