Skip to content

Commit 4d1243b

Browse files
authored
ORT ETW dynamic logging that improves ORT diagnosability & performance (microsoft#18882)
### Description This PR has several combined ORT ETW changes that improve ORT log diagnosability & performance.  - The existing log behavior in the ORT API and Severity behavior remain the same as compiled by the dev using the ORT API - The PR keeps the existing design which has 2 TraceLogging providers defined (although both were not used before this PR) - Keeps great inference (inf) and session load performance even with dynamic logging enabled (see below) - On Windows, when ONNXRuntimeTraceLoggingProvider is enabled, then ORT will dynamically _add_ a new sink reflecting the severity level provided by ETW dynamically. E.G Critical - Verbose per the need at runtime - This allows previous printf style LOGS() statements both for CPU and NPU cases to flow to ETW via a local trace (if enabled) - This DOES NOT add any new Telemetry which can optionally be sent to Microsoft. - Telemetry are ETW events marked with the Measure keyword that _can_ be sampled if a box opts-in - Existing Microsoft.ML.ONNXRuntime events have appropriate keywords and levels added if they were missing - If Execution Providers (EPs) can provide more detailed insight into their HW or component, then this PR allows for those to be dynamically logged instead of just at compile time - In this PR, the QNN EP for QC NPUs can have basic or detailed profiling enabled to give some insight into how the NPU is performing - When the Microsoft.ML.ONNXRuntime ETW provider is enabled with the Profiling keyword and level 5 then QC QNN basic profiling info is output to ETW ### Motivation and Context - This make ORT logging and diagnosability more performant (on Windows) and available in a wider variety of runtime environments. - The performance difference for inf times was ~300x+ drastically better/faster when these logs were output to ETW vs just stdout (Verbose Severity) - This style of ETW dynamic tracing is the widely used standard for Windows components, and even by some 3rd party software since the ETW API is open and part of the Windows API - These ETW based logs can be seamlessly combined with other ETW logs such as an AI component/feature using ORT, OS CPU profiling, scheduling, and more - Before the PR, ORT logging is largely printf style and output to a sink (usually stdout) only if compiled with a certain log Severity. When enabled the previous logging (to stdout) would vastly slow down inference times. Once compiled for release the internal ORT logs were not accessible by anyone except the AI model developer in their dev inner loop. That means logs could not be enabled on a lab machine, or on a production system where the runtime behavior or performance might be different for various reasons on a wide variety of HW. - This change was tested with performance in mind and tested with a mobilenet small AI model with onnxruntime_perf_test - CPU: There was no statistical difference for inf times with the baseline (main) or this PR whether ETW was enabled or not (both ORT providers all keywords level 5). - NPU (QNN on SP9 or Dev Kit 2023 QC SQ3): There was no statistical difference for inf times with this PR whether ETW (both ORT providers all keywords) were enabled or not for Level 5 (Verbose). This is even with QNN Basic profiling turned on and outputting NPU stats to ETW - As expected and designed, there was perf slowdown when Max Level 255 is enabled which translates to QNN Detailed profiling. This mirrors the expected slowdown in the NPU when individual model operations are monitored & recorded as well. This perf is similar to the QNN SDK Detailed profiling performance separate from this PR. This is designed to be above level 5 (verbose) as that is commonly the max level used in many trace profiles and won't affect inf performance. - Other OSes such as Linux & Android are left untouched for now. - Out of scope for this PR but TraceLogging is available for Linux with LTTng tracing. So in the future, this optional tracing could also be made available on other OSes where a TraceLogging API is available
1 parent d0bac82 commit 4d1243b

22 files changed

+591
-123
lines changed

include/onnxruntime/core/common/logging/logging.h

+27
Original file line numberDiff line numberDiff line change
@@ -75,6 +75,21 @@ struct Category {
7575
// TODO: What other high level categories are meaningful? Model? Optimizer? Execution?
7676
};
7777

78+
/// <summary>
79+
/// ORT TraceLogging keywords for categories of dynamic logging enablement
80+
/// </summary>
81+
enum class ORTTraceLoggingKeyword : uint64_t {
82+
Session = 0x1, // ORT Session TraceLoggingWrite
83+
Logs = 0x2, // LOGS() Macro ORT logs. Pair with an appropriate level depending on detail required
84+
Reserved1 = 0x4, // Reserved if we want to add some specific sub-categories instead of just LOGS() or other uses
85+
Reserved2 = 0x8,
86+
Reserved3 = 0x10,
87+
Reserved4 = 0x20,
88+
Reserved5 = 0x40,
89+
Reserved6 = 0x80,
90+
Profiling = 0x100 // Enables profiling. At higher levels >5 can impact inference performance
91+
};
92+
7893
class ISink;
7994
class Logger;
8095
class Capture;
@@ -333,5 +348,17 @@ unsigned int GetThreadId();
333348
*/
334349
unsigned int GetProcessId();
335350

351+
/**
352+
If the ONNXRuntimeTraceLoggingProvider ETW Provider is enabled, then adds to the existing logger.
353+
*/
354+
std::unique_ptr<ISink> EnhanceLoggerWithEtw(std::unique_ptr<ISink> existingLogger, logging::Severity originalSeverity,
355+
logging::Severity etwSeverity);
356+
357+
/**
358+
If the ONNXRuntimeTraceLoggingProvider ETW Provider is enabled, then can override the logging level.
359+
But this overrided level only applies to the ETW sink. The original logger(s) retain their original logging level
360+
*/
361+
Severity OverrideLevelWithEtw(Severity originalSeverity);
362+
336363
} // namespace logging
337364
} // namespace onnxruntime

onnxruntime/core/common/logging/logging.cc

+33
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,8 @@
1212

1313
#ifdef _WIN32
1414
#include <Windows.h>
15+
#include "core/platform/windows/logging/etw_sink.h"
16+
#include "core/common/logging/sinks/composite_sink.h"
1517
#else
1618
#include <unistd.h>
1719
#if defined(__MACH__) || defined(__wasm__) || defined(_AIX)
@@ -243,5 +245,36 @@ unsigned int GetProcessId() {
243245
#endif
244246
}
245247

248+
std::unique_ptr<ISink> EnhanceLoggerWithEtw(std::unique_ptr<ISink> existingLogger, logging::Severity originalSeverity,
249+
logging::Severity etwSeverity) {
250+
#ifdef _WIN32
251+
auto& manager = EtwRegistrationManager::Instance();
252+
if (manager.IsEnabled()) {
253+
auto compositeSink = std::make_unique<CompositeSink>();
254+
compositeSink->AddSink(std::move(existingLogger), originalSeverity);
255+
compositeSink->AddSink(std::make_unique<EtwSink>(), etwSeverity);
256+
return compositeSink;
257+
} else {
258+
return existingLogger;
259+
}
260+
#else
261+
// On non-Windows platforms, just return the existing logger
262+
(void)originalSeverity;
263+
(void)etwSeverity;
264+
return existingLogger;
265+
#endif // _WIN32
266+
}
267+
268+
Severity OverrideLevelWithEtw(Severity originalSeverity) {
269+
#ifdef _WIN32
270+
auto& manager = logging::EtwRegistrationManager::Instance();
271+
if (manager.IsEnabled() &&
272+
(manager.Keyword() & static_cast<ULONGLONG>(onnxruntime::logging::ORTTraceLoggingKeyword::Logs)) != 0) {
273+
return manager.MapLevelToSeverity();
274+
}
275+
#endif // _WIN32
276+
return originalSeverity;
277+
}
278+
246279
} // namespace logging
247280
} // namespace onnxruntime

onnxruntime/core/common/logging/sinks/composite_sink.h

+18-5
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,8 @@
55

66
#include <string>
77
#include <vector>
8+
#include <utility>
9+
#include <memory>
810

911
#include "core/common/logging/isink.h"
1012
#include "core/common/logging/logging.h"
@@ -27,20 +29,31 @@ class CompositeSink : public ISink {
2729
/// Adds a sink. Takes ownership of the sink (so pass unique_ptr by value).
2830
/// </summary>
2931
/// <param name="sink">The sink.</param>
32+
/// <param name="severity">The min severity to send a message to that sink</param>
3033
/// <returns>This instance to allow chaining.</returns>
31-
CompositeSink& AddSink(std::unique_ptr<ISink> sink) {
32-
sinks_.push_back(std::move(sink));
34+
CompositeSink& AddSink(std::unique_ptr<ISink> sink, logging::Severity severity) {
35+
sinks_with_severity_.emplace_back(std::move(sink), severity);
3336
return *this;
3437
}
3538

39+
/// <summary>
40+
/// Gets a const reference to the collection of sinks and min severity for that sink
41+
/// </summary>
42+
/// <returns>A const reference to the vector pair of unique_ptr to ISink and severity.</returns>
43+
const std::vector<std::pair<std::unique_ptr<ISink>, logging::Severity>>& GetSinks() const {
44+
return sinks_with_severity_;
45+
}
46+
3647
private:
3748
void SendImpl(const Timestamp& timestamp, const std::string& logger_id, const Capture& message) override {
38-
for (auto& sink : sinks_) {
39-
sink->Send(timestamp, logger_id, message);
49+
for (auto& sink_pair : sinks_with_severity_) {
50+
if (message.Severity() >= sink_pair.second) {
51+
sink_pair.first->Send(timestamp, logger_id, message);
52+
}
4053
}
4154
}
4255

43-
std::vector<std::unique_ptr<ISink>> sinks_;
56+
std::vector<std::pair<std::unique_ptr<ISink>, logging::Severity>> sinks_with_severity_;
4457
};
4558
} // namespace logging
4659
} // namespace onnxruntime

onnxruntime/core/framework/execution_providers.h

+3
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,7 @@
1313
#include "core/graph/graph_viewer.h"
1414
#include "core/common/logging/logging.h"
1515
#ifdef _WIN32
16+
#include <winmeta.h>
1617
#include "core/platform/tracing.h"
1718
#endif
1819

@@ -47,6 +48,8 @@ class ExecutionProviders {
4748
TraceLoggingWrite(
4849
telemetry_provider_handle,
4950
"ProviderOptions",
51+
TraceLoggingKeyword(static_cast<uint64_t>(onnxruntime::logging::ORTTraceLoggingKeyword::Session)),
52+
TraceLoggingLevel(WINEVENT_LEVEL_INFO),
5053
TraceLoggingString(provider_id.c_str(), "ProviderId"),
5154
TraceLoggingString(config_pair.first.c_str(), "Key"),
5255
TraceLoggingString(config_pair.second.c_str(), "Value"));

onnxruntime/core/framework/sequential_executor.cc

+3-3
Original file line numberDiff line numberDiff line change
@@ -181,7 +181,7 @@ class SessionScope {
181181
}
182182

183183
auto& logger = session_state_.Logger();
184-
LOGS(logger, VERBOSE) << "Begin execution";
184+
VLOGS(logger, 0) << "Begin execution";
185185
const SequentialExecutionPlan& seq_exec_plan = *session_state_.GetExecutionPlan();
186186
const auto& exec_plan_vec = seq_exec_plan.execution_plan;
187187
VLOGS(logger, 1) << "Size of execution plan vector: " << exec_plan_vec.size();
@@ -515,7 +515,7 @@ onnxruntime::Status ExecuteKernel(StreamExecutionContext& ctx,
515515
return Status(status.Category(), status.Code(), msg_string);
516516
}
517517
ctx.RecycleNodeInputs(idx);
518-
LOGS(logger, VERBOSE) << "stream " << stream_idx << " launch kernel with idx " << idx;
518+
VLOGS(logger, 0) << "stream " << stream_idx << " launch kernel with idx " << idx;
519519
return Status::OK();
520520
}
521521

@@ -531,7 +531,7 @@ onnxruntime::Status ExecuteThePlan(const SessionState& session_state, gsl::span<
531531
const bool only_execute_path_to_fetches,
532532
bool single_thread_mode) {
533533
auto* execution_plan = session_state.GetExecutionPlan();
534-
LOGS(logger, VERBOSE) << "Number of streams: " << execution_plan->execution_plan.size();
534+
VLOGS(logger, 0) << "Number of streams: " << execution_plan->execution_plan.size();
535535
int32_t valid_streams = 0;
536536
for (auto& stream : execution_plan->execution_plan) {
537537
if (stream && stream->steps_.size() > 0)

onnxruntime/core/framework/stream_execution_context.cc

+1-1
Original file line numberDiff line numberDiff line change
@@ -168,7 +168,7 @@ void StreamExecutionContext::RecycleNodeInputs(onnxruntime::NodeIndex node_index
168168
for (auto idx : execution_plan->node_release_list[node_index]) {
169169
if (--release_plan_[idx] == 0) {
170170
ORT_ENFORCE(frame_.ReleaseMLValue(static_cast<int>(execution_plan->release_actions[idx].value_index)).IsOK());
171-
LOGS(*logger_, VERBOSE) << "ort value " << execution_plan->release_actions[idx].value_index << " released";
171+
VLOGS(*logger_, 0) << "ort value " << execution_plan->release_actions[idx].value_index << " released";
172172
}
173173
}
174174
}

onnxruntime/core/platform/telemetry.cc

+15
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,21 @@ void LogRuntimeError(uint32_t sessionId, const common::Status& status, const cha
1212
env.GetTelemetryProvider().LogRuntimeError(sessionId, status, file, function, line);
1313
}
1414

15+
bool Telemetry::IsEnabled() const {
16+
return false;
17+
}
18+
19+
// Get the current logging level
20+
// The Level defined as uchar is coming from the ETW Enable callback in TraceLoggingRegisterEx.
21+
unsigned char Telemetry::Level() const {
22+
return 0;
23+
}
24+
25+
// Get the current keyword
26+
uint64_t Telemetry::Keyword() const {
27+
return 0;
28+
}
29+
1530
void Telemetry::EnableTelemetryEvents() const {
1631
}
1732

onnxruntime/core/platform/telemetry.h

+8
Original file line numberDiff line numberDiff line change
@@ -38,6 +38,14 @@ class Telemetry {
3838
virtual void DisableTelemetryEvents() const;
3939
virtual void SetLanguageProjection(uint32_t projection) const;
4040

41+
virtual bool IsEnabled() const;
42+
43+
// Get the current logging level
44+
virtual unsigned char Level() const;
45+
46+
// Get the current keyword
47+
virtual uint64_t Keyword() const;
48+
4149
virtual void LogProcessInfo() const;
4250

4351
virtual void LogSessionCreationStart() const;

onnxruntime/core/platform/windows/logging/etw_sink.cc

+97-31
Original file line numberDiff line numberDiff line change
@@ -58,52 +58,116 @@ TRACELOGGING_DEFINE_PROVIDER(etw_provider_handle, "ONNXRuntimeTraceLoggingProvid
5858
#pragma warning(pop)
5959
#endif
6060

61-
// Class to unregister ETW provider at shutdown.
62-
// We expect one static instance to be created for the lifetime of the program.
63-
class EtwRegistrationManager {
64-
public:
65-
static EtwRegistrationManager& Register() {
66-
const HRESULT etw_status = ::TraceLoggingRegister(etw_provider_handle);
67-
68-
if (FAILED(etw_status)) {
69-
ORT_THROW("ETW registration failed. Logging will be broken: " + std::to_string(etw_status));
70-
}
61+
EtwRegistrationManager& EtwRegistrationManager::Instance() {
62+
static EtwRegistrationManager instance;
63+
instance.LazyInitialize();
64+
return instance;
65+
}
7166

72-
// return an instance that is just used to unregister as the program exits
73-
static EtwRegistrationManager instance(etw_status);
74-
return instance;
75-
}
67+
bool EtwRegistrationManager::IsEnabled() const {
68+
std::lock_guard<OrtMutex> lock(provider_change_mutex_);
69+
return is_enabled_;
70+
}
71+
72+
UCHAR EtwRegistrationManager::Level() const {
73+
std::lock_guard<OrtMutex> lock(provider_change_mutex_);
74+
return level_;
75+
}
7676

77-
const HRESULT Status() const noexcept {
78-
return etw_status_;
77+
Severity EtwRegistrationManager::MapLevelToSeverity() {
78+
switch (level_) {
79+
case TRACE_LEVEL_NONE:
80+
return Severity::kFATAL; // There is no none severity option
81+
case TRACE_LEVEL_VERBOSE:
82+
return Severity::kVERBOSE;
83+
case TRACE_LEVEL_INFORMATION:
84+
return Severity::kINFO;
85+
case TRACE_LEVEL_WARNING:
86+
return Severity::kWARNING;
87+
case TRACE_LEVEL_ERROR:
88+
return Severity::kERROR;
89+
case TRACE_LEVEL_CRITICAL:
90+
return Severity::kFATAL;
91+
default:
92+
return Severity::kVERBOSE;
7993
}
94+
}
95+
96+
ULONGLONG EtwRegistrationManager::Keyword() const {
97+
std::lock_guard<OrtMutex> lock(provider_change_mutex_);
98+
return keyword_;
99+
}
80100

81-
~EtwRegistrationManager() {
82-
::TraceLoggingUnregister(etw_provider_handle);
101+
HRESULT EtwRegistrationManager::Status() const {
102+
return etw_status_;
103+
}
104+
105+
void EtwRegistrationManager::RegisterInternalCallback(const EtwInternalCallback& callback) {
106+
std::lock_guard<OrtMutex> lock(callbacks_mutex_);
107+
callbacks_.push_back(callback);
108+
}
109+
110+
void NTAPI EtwRegistrationManager::ORT_TL_EtwEnableCallback(
111+
_In_ LPCGUID SourceId,
112+
_In_ ULONG IsEnabled,
113+
_In_ UCHAR Level,
114+
_In_ ULONGLONG MatchAnyKeyword,
115+
_In_ ULONGLONG MatchAllKeyword,
116+
_In_opt_ PEVENT_FILTER_DESCRIPTOR FilterData,
117+
_In_opt_ PVOID CallbackContext) {
118+
auto& manager = EtwRegistrationManager::Instance();
119+
{
120+
std::lock_guard<OrtMutex> lock(manager.provider_change_mutex_);
121+
manager.is_enabled_ = (IsEnabled != 0);
122+
manager.level_ = Level;
123+
manager.keyword_ = MatchAnyKeyword;
83124
}
125+
manager.InvokeCallbacks(SourceId, IsEnabled, Level, MatchAnyKeyword, MatchAllKeyword, FilterData, CallbackContext);
126+
}
127+
128+
EtwRegistrationManager::~EtwRegistrationManager() {
129+
::TraceLoggingUnregister(etw_provider_handle);
130+
}
84131

85-
private:
86-
ORT_DISALLOW_COPY_ASSIGNMENT_AND_MOVE(EtwRegistrationManager);
132+
EtwRegistrationManager::EtwRegistrationManager() {
133+
}
87134

88-
EtwRegistrationManager(const HRESULT status) noexcept : etw_status_{status} {}
89-
const HRESULT etw_status_;
90-
};
135+
void EtwRegistrationManager::LazyInitialize() {
136+
if (!initialized_) {
137+
std::lock_guard<OrtMutex> lock(init_mutex_);
138+
if (!initialized_) { // Double-check locking pattern
139+
initialized_ = true;
140+
etw_status_ = ::TraceLoggingRegisterEx(etw_provider_handle, ORT_TL_EtwEnableCallback, nullptr);
141+
if (FAILED(etw_status_)) {
142+
ORT_THROW("ETW registration failed. Logging will be broken: " + std::to_string(etw_status_));
143+
}
144+
}
145+
}
146+
}
147+
148+
void EtwRegistrationManager::InvokeCallbacks(LPCGUID SourceId, ULONG IsEnabled, UCHAR Level, ULONGLONG MatchAnyKeyword,
149+
ULONGLONG MatchAllKeyword, PEVENT_FILTER_DESCRIPTOR FilterData,
150+
PVOID CallbackContext) {
151+
std::lock_guard<OrtMutex> lock(callbacks_mutex_);
152+
for (const auto& callback : callbacks_) {
153+
callback(SourceId, IsEnabled, Level, MatchAnyKeyword, MatchAllKeyword, FilterData, CallbackContext);
154+
}
155+
}
91156

92157
void EtwSink::SendImpl(const Timestamp& timestamp, const std::string& logger_id, const Capture& message) {
93158
UNREFERENCED_PARAMETER(timestamp);
94159

95160
// register on first usage
96-
static EtwRegistrationManager& etw_manager = EtwRegistrationManager::Register();
161+
static EtwRegistrationManager& etw_manager = EtwRegistrationManager::Instance();
97162

98163
// do something (not that meaningful) with etw_manager so it doesn't get optimized out
99164
// as we want an instance around to do the unregister
100165
if (FAILED(etw_manager.Status())) {
101166
return;
102167
}
103168

104-
// Do we want to output Verbose level messages via ETW at any point it time?
105169
// TODO: Validate if this filtering makes sense.
106-
if (message.Severity() <= Severity::kVERBOSE || message.DataType() == DataType::USER) {
170+
if (message.DataType() == DataType::USER) {
107171
return;
108172
}
109173

@@ -114,11 +178,13 @@ void EtwSink::SendImpl(const Timestamp& timestamp, const std::string& logger_id,
114178
// TraceLoggingWrite requires (painfully) a compile time constant for the TraceLoggingLevel,
115179
// forcing us to use an ugly macro for the call.
116180
#define ETW_EVENT_NAME "ONNXRuntimeLogEvent"
117-
#define TRACE_LOG_WRITE(level) \
118-
TraceLoggingWrite(etw_provider_handle, ETW_EVENT_NAME, TraceLoggingLevel(level), \
119-
TraceLoggingString(logger_id.c_str(), "logger"), \
120-
TraceLoggingString(message.Category(), "category"), \
121-
TraceLoggingString(message.Location().ToString().c_str(), "location"), \
181+
#define TRACE_LOG_WRITE(level) \
182+
TraceLoggingWrite(etw_provider_handle, ETW_EVENT_NAME, \
183+
TraceLoggingKeyword(static_cast<uint64_t>(onnxruntime::logging::ORTTraceLoggingKeyword::Logs)), \
184+
TraceLoggingLevel(level), \
185+
TraceLoggingString(logger_id.c_str(), "logger"), \
186+
TraceLoggingString(message.Category(), "category"), \
187+
TraceLoggingString(message.Location().ToString().c_str(), "location"), \
122188
TraceLoggingString(message.Message().c_str(), "message"))
123189

124190
const auto severity{message.Severity()};

0 commit comments

Comments
 (0)