-
Notifications
You must be signed in to change notification settings - Fork 13.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[OpenMP][OMPT] initial_device_num
on initialize
callback set incorrectly
#134451
Comments
@llvm/issue-subscribers-openmp Author: Jan André Reuter (Thyre)
We've recently released Score-P v9.0, having support for the OpenMP target via the OpenMP Tools Interface for the first time.
However, a user quickly noticed an issue, which boils down to a bug in LLVM. typedef int (*ompt_initialize_t) (ompt_function_lookup_t lookup, int initial_device_num, ompt_data_t *tool_data);
a tool cannot safely determine if a device identifier in some target callback is actually the host. LLVM typically uses To reproduce the issue: #include <omp.h>
#include <omp-tools.h>
#include <stdlib.h>
#include <assert.h>
#include <inttypes.h>
#include <string.h>
#include <stdbool.h>
#include <stdint.h>
#include <stdio.h>
#include <stdatomic.h>
/* MAIN */
int main( int argc, char** argv )
{
#pragma omp target
{
printf( "Hello from target region\n" );
}
}
/* OMPT INTERFACE */
#if ( defined( __ppc64__ ) || defined( __powerpc64__ ) || defined( __PPC64__ ) )
#define OMPT_TOOL_CPU_RELAX ( ( void )0 )
#elif ( defined( __x86_64 ) || defined( __x86_64__ ) || defined( __amd64 ) || defined( _M_X64 ) )
#define OMPT_TOOL_CPU_RELAX __asm__ volatile ( "pause" )
#elif ( defined( __aarch64__ ) || defined( __ARM64__ ) || defined( _M_ARM64 ) )
#define OMPT_TOOL_CPU_RELAX __asm__ volatile ( "yield" )
#else
#define OMPT_TOOL_CPU_RELAX ( ( void )0 )
#endif
/* Test-and-test-and-set lock. Mutexes are of type bool */
#define OMPT_TOOL_LOCK( MUTEX ) \
while ( true ) \
{ \
if ( atomic_flag_test_and_set_explicit( &( MUTEX ), memory_order_acquire ) != true ) \
{ \
break; \
} \
OMPT_TOOL_CPU_RELAX; \
}
#define OMPT_TOOL_UNLOCK( MUTEX ) atomic_flag_clear_explicit( &( MUTEX ), memory_order_release );
#define OMPT_TOOL_GUARDED_PRINTF( ... ) \
OMPT_TOOL_LOCK( ompt_tool_printf_mutex ) \
printf( __VA_ARGS__ ); \
OMPT_TOOL_UNLOCK( ompt_tool_printf_mutex )
_Thread_local int32_t ompt_tool_tid = -1; /* thread counter. >= 1 after thread_begin */
atomic_flag ompt_tool_printf_mutex = ATOMIC_FLAG_INIT;
static int
my_initialize_tool( ompt_function_lookup_t lookup,
int initial_device_num,
ompt_data_t* tool_data )
{
OMPT_TOOL_GUARDED_PRINTF( "[%s] tid = %" PRId32 " | initial_device_num %d\n",
__FUNCTION__,
ompt_tool_tid,
initial_device_num );
return 1; /* non-zero indicates success */
}
static void
my_finalize_tool( ompt_data_t* tool_data )
{
OMPT_TOOL_GUARDED_PRINTF( "[%s] tid = %" PRId32 "\n",
__FUNCTION__,
ompt_tool_tid );
}
ompt_start_tool_result_t*
ompt_start_tool( unsigned int omp_version,
const char* runtime_version )
{
setbuf( stdout, NULL );
OMPT_TOOL_GUARDED_PRINTF( "[%s] tid = %" PRId32 " | omp_version %d | runtime_version = \'%s\'\n",
__FUNCTION__,
ompt_tool_tid,
omp_version,
runtime_version );
static ompt_start_tool_result_t tool = { &my_initialize_tool,
&my_finalize_tool,
ompt_data_none };
return &tool;
} Building the reproducer on a system with at least one GPU (maybe $ clang -fopenmp --offload-arch=sm_75 test.c
$ ./a.out
[ompt_start_tool] tid = -1 | omp_version 201611 | runtime_version = 'LLVM OMP version: 5.0.20140926'
[my_initialize_tool] tid = -1 | initial_device_num 0
Hello from target region
[my_finalize_tool] tid = -1 As one can see |
Looking at the source code a bit, there are two places where OMPT is initialized:
The first case looks correct. However, I think we never see it in the case where offload is used, right? llvm-project/offload/DeviceRTL/src/State.cpp Line 430 in 64b060f
Here, I don't understand when which one is chosen... |
We've recently released Score-P v9.0, having support for the OpenMP target via the OpenMP Tools Interface for the first time.
However, a user quickly noticed an issue, which boils down to a bug in LLVM.
When the OpenMP runtime initializes a connected tool, the following callback will be dispatched:
initial_device_num
is defined as "the value that a call to omp_get_initial_device would return". This itself is defined as being "the value of the device number is the value of omp_initial_device or the value returned by the omp_get_num_devices routine".Looking at the actual returned values however, this is not the case. Instead, LLVM always reports
initial_device_num = 0
. Since LLVM also does not implementompt_get_num_devices
correctlyllvm-project/openmp/runtime/src/ompt-general.cpp
Line 868 in 19e0233
a tool cannot safely determine if a device identifier in some target callback is actually the host.
LLVM typically uses
-1
for this, but tools should not have to rely on guessing this, as it might or might not change in later versions. This also makes supporting different runtimes more complicated.To reproduce the issue:
Building the reproducer on a system with at least one GPU (maybe
-fopenmp-targets=x86_64
works too), one will see the following result:As one can see
initial_device_num
is0
. In my case, it should be either-1
or1
.The text was updated successfully, but these errors were encountered: