|
34 | 34 | # 4. Using profiler to analyze memory consumption
|
35 | 35 | # 5. Using tracing functionality
|
36 | 36 | # 6. Examining stack traces
|
37 |
| -# 7. Visualizing data as a flame graph |
38 |
| -# 8. Using profiler to analyze long-running jobs |
| 37 | +# 7. Using profiler to analyze long-running jobs |
39 | 38 | #
|
40 | 39 | # 1. Import all necessary libraries
|
41 | 40 | # ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
122 | 121 | # aten::select 1.668ms 2.292ms 8.988us 255
|
123 | 122 | # --------------------------------- ------------ ------------ ------------ ------------
|
124 | 123 | # Self CPU time total: 57.549m
|
125 |
| -# |
| 124 | +# |
126 | 125 |
|
127 | 126 | ######################################################################
|
128 | 127 | # Here we see that, as expected, most of the time is spent in convolution (and specifically in ``mkldnn_convolution``
|
|
327 | 326 | #
|
328 | 327 | # (Warning: stack tracing adds an extra profiling overhead.)
|
329 | 328 |
|
330 |
| - |
331 |
| -###################################################################### |
332 |
| -# 7. Visualizing data as a flame graph |
333 |
| -# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
334 |
| -# |
335 |
| -# Execution time (``self_cpu_time_total`` and ``self_cuda_time_total`` metrics) and stack traces |
336 |
| -# can also be visualized as a flame graph. To do this, first export the raw data using ``export_stacks`` (requires ``with_stack=True``): |
337 |
| - |
338 |
| -prof.export_stacks("/tmp/profiler_stacks.txt", "self_cuda_time_total") |
339 |
| - |
340 |
| -###################################################################### |
341 |
| -# We recommend using `Flamegraph tool <https://github.com/brendangregg/FlameGraph>`_ to generate an |
342 |
| -# interactive ``.svg`` file: |
343 |
| -# |
344 |
| -# .. code-block:: sh |
345 |
| -# |
346 |
| -# git clone https://github.com/brendangregg/FlameGraph |
347 |
| -# cd FlameGraph |
348 |
| -# ./flamegraph.pl --title "CUDA time" --countname "us." /tmp/profiler_stacks.txt > perf_viz.svg |
349 |
| -# |
350 |
| - |
351 |
| -###################################################################### |
352 |
| -# |
353 |
| -# .. image:: ../../_static/img/perf_viz.png |
354 |
| -# :scale: 25 % |
355 |
| - |
356 |
| - |
357 | 329 | ######################################################################
|
358 |
| -# 8. Using profiler to analyze long-running jobs |
| 330 | +# 7. Using profiler to analyze long-running jobs |
359 | 331 | # ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
360 | 332 | #
|
361 | 333 | # PyTorch profiler offers an additional API to handle long-running jobs
|
|
0 commit comments