Some questions about profile #501

chairman-lu · 2024-12-07T09:02:02Z

when I run the code like this:
options = onnxruntime.SessionOptions()
options.enable_profiling=True
session = onnxruntime.InferenceSession(model_path, providers=providers, sess_options=options)
session.run_with_iobinding(io_binding)
profile_file = session.end_profiling()

I got a json profile file, and visualize it with edge://tracing/, got the result as follows:

I have two questions here:

why the first conv cost so much? Is the time of loading data from host to device included?
except the first conv, add the latency of other ops, the result is significantly greater than the time I got in my time test code below. So how can I get the exact run time of every single op?

The text was updated successfully, but these errors were encountered:

tianleiwu · 2024-12-10T00:50:42Z

This first Conv need run all possible kernels to find the fastest kernel for a given input shape. It will take longer than the remaining inference run of the same shape input.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Some questions about profile #501

Some questions about profile #501

chairman-lu commented Dec 7, 2024

tianleiwu commented Dec 10, 2024

Some questions about profile #501

Some questions about profile #501

Comments

chairman-lu commented Dec 7, 2024

tianleiwu commented Dec 10, 2024