Skip to content

Commit c475629

Browse files
Gamal Sallamfacebook-github-bot
Gamal Sallam
authored andcommitted
Perfmap C-API and JIT integration
Summary: With the perf trampoline writing to the perf-map files, we want to have a C-API to unify writing to the perf-map files to avoid file corruption from simultaneous writes. We are trying to upstream the API here python/cpython#103546. More details about the motivation is in the PR. In addition to introducing the C-API, we also change JIT to utilize the new C-API. Reviewed By: czardoz Differential Revision: D45421966 fbshipit-source-id: d270cc753a245f93cbfe3d723d0880595fef45f2
1 parent a88f497 commit c475629

File tree

12 files changed

+359
-119
lines changed

12 files changed

+359
-119
lines changed

Cinder/module/known-core-python-exported-symbols

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1985,6 +1985,9 @@ _PyUnicode_XStrip
19851985
_PyUnion_Type
19861986
_Py_union_type_or
19871987
Py_UniversalNewlineFgets
1988+
PyUnstable_PerfMapState_Fini
1989+
PyUnstable_PerfMapState_Init
1990+
PyUnstable_WritePerfMapEntry
19881991
_Py_UTF8_Edit_Cost
19891992
_Py_VaBuildStack
19901993
_Py_VaBuildStack_SizeT

Doc/c-api/perfmaps.rst

Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,39 @@
1+
2+
3+
.. highlight:: c
4+
5+
.. _perfmaps:
6+
7+
Support for Perf Maps
8+
----------------------
9+
10+
On supported platforms (as of this writing, only Linux), the runtime can take
11+
advantage of *perf map files* to make Python functions visible to an external
12+
profiling tool (such as `perf <https://perf.wiki.kernel.org/index.php/Main_Page>`_).
13+
A running process may create a file in the `/tmp` directory, which contains entries
14+
that can map a section of executable code to a name. This interface is described in the
15+
`documentation of the Linux Perf tool <https://git.kernel.org/pub/scm/linux/
16+
kernel/git/torvalds/linux.git/tree/tools/perf/Documentation/jit-interface.txt>`_.
17+
18+
In Python, these helper APIs can be used by libraries and features that rely
19+
on generating machine code on the fly.
20+
21+
.. c:function:: int PyUnstable_PerfMapState_Init(void)
22+
Open the `/tmp/perf-$pid.map` file, unless it's already opened, and create
23+
a lock to ensure thread-safe writes to the file (provided the writes are
24+
done through :c:func:`PyUnstable_WritePerfMapEntry`). Normally, there's no need
25+
to call this explicitly, and it is safe to directly use :c:func:`PyUnstable_WritePerfMapEntry`
26+
in your code. If the state isn't already initialized, it will be created on
27+
the first call.
28+
.. c:function:: int PyUnstable_WritePerfMapEntry(const void *code_addr, unsigned int code_size, const char *entry_name)
29+
Write one single entry to the `/tmp/perf-$pid.map` file. This function is
30+
thread safe. Here is what an example entry looks like::
31+
# address size name
32+
0x7f3529fcf759 b py::bar:/run/t.py
33+
Extensions are encouraged to directly call this API when needed, instead of
34+
separately initializing the state by calling :c:func:`PyUnstable_PerfMapState_Init`.
35+
.. c:function:: int PyUnstable_PerfMapState_Fini(void)
36+
Close the perf map file, which was opened in `PyUnstable_PerfMapState_Init`. This
37+
API is called by the runtime itself, during interpreter shut-down. In general,
38+
there shouldn't be a reason to explicitly call this, except to handle specific
39+
scenarios such as forking.

Doc/howto/perf_profiling.rst

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,8 @@ functions to appear in the output of the ``perf`` profiler. When this mode is
2323
enabled, the interpreter will interpose a small piece of code compiled on the
2424
fly before the execution of every Python function and it will teach ``perf`` the
2525
relationship between this piece of code and the associated Python function using
26-
`perf map files`_.
26+
`perf map files`_. If you're an extension author interested in having your extension
27+
write to the perf map files, refer to :doc:`the C-API <../c-api/perfmaps>`.
2728

2829
.. warning::
2930

Include/internal/pycore_ceval.h

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -50,7 +50,7 @@ typedef struct {
5050
void (*write_state)(void* state, const void *code_addr,
5151
unsigned int code_size, PyCodeObject* code);
5252
// Callback to free the trampoline state
53-
int (*free_state)(void* state);
53+
void (*free_state)(void);
5454
} _PyPerf_Callbacks;
5555

5656
extern int _PyPerfTrampoline_SetCallbacks(_PyPerf_Callbacks *);

Include/osmodule.h

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,19 @@ extern "C" {
1111
PyAPI_FUNC(PyObject *) PyOS_FSPath(PyObject *path);
1212
#endif
1313

14+
#if !defined(Py_LIMITED_API)
15+
typedef struct {
16+
FILE* perf_map;
17+
PyThread_type_lock map_lock;
18+
} PerfMapState;
19+
20+
PyAPI_FUNC(int) PyUnstable_PerfMapState_Init(void);
21+
22+
PyAPI_FUNC(int) PyUnstable_WritePerfMapEntry(const void *code_addr, unsigned int code_size, const char *entry_name);
23+
24+
PyAPI_FUNC(void) PyUnstable_PerfMapState_Fini(void);
25+
#endif
26+
1427
#ifdef __cplusplus
1528
}
1629
#endif

Jit/perf_jitdump.cpp

Lines changed: 154 additions & 34 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,7 @@
66
#include "Jit/pyjit.h"
77
#include "Jit/threaded_compile.h"
88
#include "Jit/util.h"
9+
#include "pycore_ceval.h"
910

1011
#include <elf.h>
1112
#include <fcntl.h>
@@ -19,6 +20,10 @@
1920
#include <cstdio>
2021
#include <cstring>
2122
#include <ctime>
23+
#include <iostream>
24+
#include <regex>
25+
#include <sstream>
26+
#include <tuple>
2227

2328
#ifdef __x86_64__
2429
// Use the cheaper rdtsc by default. If you disable this for some reason, or
@@ -242,6 +247,40 @@ void initFiles() {
242247
inited = true;
243248
}
244249

250+
// Parses a JIT entry and returns a tuple containing the
251+
// code address, code size, and entry name. An example of an entry is:
252+
// 7fa873c00148 360 __CINDER_JIT:__main__:foo2
253+
std::tuple<const void*, unsigned int, const char*> parseJitEntry(
254+
const char* entry) {
255+
std::string_view entry_view = entry;
256+
size_t space_pos_1 = entry_view.find(' ');
257+
258+
// Extract the hexadecimal code address
259+
const char* code_addr_str = entry_view.substr(0, space_pos_1).data();
260+
unsigned long long code_addr_val = 0;
261+
std::from_chars(
262+
code_addr_str, code_addr_str + space_pos_1, code_addr_val, 16);
263+
const void* code_addr = reinterpret_cast<const void*>(code_addr_val);
264+
265+
// Find the second space character
266+
size_t space_pos_2 = entry_view.find(' ', space_pos_1 + 1);
267+
268+
// Extract the hexadecimal code size
269+
const char* code_size_str =
270+
entry_view.substr(space_pos_1 + 1, space_pos_2).data();
271+
uint32_t code_size;
272+
std::from_chars(
273+
code_size_str,
274+
code_size_str + (space_pos_2 - space_pos_1 - 1),
275+
code_size,
276+
16);
277+
278+
// Extract the entry name
279+
const char* entry_name = entry_view.substr(space_pos_2 + 1).data();
280+
281+
return std::make_tuple(code_addr, code_size, entry_name);
282+
}
283+
245284
// Copy the contents of from_name to to_name. Returns a std::FILE* at the end
246285
// of to_name on success, or nullptr on failure.
247286
std::FILE* copyFile(const std::string& from_name, const std::string& to_name) {
@@ -277,6 +316,74 @@ std::FILE* copyFile(const std::string& from_name, const std::string& to_name) {
277316
}
278317
}
279318

319+
// Copy the contents of the parent perf map file to the child perf map file.
320+
// Returns 1 on success and 0 on failure.
321+
int copyJitFile(const std::string& parent_filename) {
322+
auto parent_file = std::fopen(parent_filename.c_str(), "r");
323+
if (parent_file == nullptr) {
324+
JIT_LOG(
325+
"Couldn't open %s for reading (%s)",
326+
parent_filename,
327+
string_error(errno));
328+
return 0;
329+
}
330+
331+
char buf[1024];
332+
while (std::fgets(buf, sizeof(buf), parent_file) != nullptr) {
333+
buf[strcspn(buf, "\n")] = '\0';
334+
auto jit_entry = parseJitEntry(buf);
335+
try {
336+
PyUnstable_WritePerfMapEntry(
337+
std::get<0>(jit_entry),
338+
std::get<1>(jit_entry),
339+
std::get<2>(jit_entry));
340+
} catch (const std::invalid_argument& e) {
341+
JIT_LOG("Error: Invalid JIT entry: %s \n", buf);
342+
}
343+
}
344+
std::fclose(parent_file);
345+
return 1;
346+
}
347+
348+
// Copy the JIT entries from the parent perf map file to the child perf map
349+
// file. This is used when perf-trampoline is enabled, as the perf map file
350+
// will also include trampoline entries. We only want to copy the JIT entries.
351+
// Returns 1 on success, and 0 on failure.
352+
int copyJitEntries(const std::string& parent_filename) {
353+
auto parent_file = std::fopen(parent_filename.c_str(), "r");
354+
if (parent_file == nullptr) {
355+
JIT_LOG(
356+
"Couldn't open %s for reading (%s)",
357+
parent_filename,
358+
string_error(errno));
359+
return 0;
360+
}
361+
362+
char buf[1024];
363+
while (std::fgets(buf, sizeof(buf), parent_file) != nullptr) {
364+
if (std::strstr(buf, "__CINDER_") != nullptr) {
365+
buf[strcspn(buf, "\n")] = '\0';
366+
auto jit_entry = parseJitEntry(buf);
367+
try {
368+
PyUnstable_WritePerfMapEntry(
369+
std::get<0>(jit_entry),
370+
std::get<1>(jit_entry),
371+
std::get<2>(jit_entry));
372+
} catch (const std::invalid_argument& e) {
373+
JIT_LOG("Error: Invalid JIT entry: %s \n", buf);
374+
}
375+
}
376+
}
377+
std::fclose(parent_file);
378+
return 1;
379+
}
380+
381+
bool isPerfTrampolineActive() {
382+
PyThreadState* tstate = PyThreadState_GET();
383+
return tstate->interp->eval_frame &&
384+
tstate->interp->eval_frame != _PyEval_EvalFrameDefault;
385+
}
386+
280387
// Copy the perf pid map from the parent process into a new file for this child
281388
// process.
282389
void copyFileInfo(FileInfo& info) {
@@ -290,33 +397,53 @@ void copyFileInfo(FileInfo& info) {
290397
fmt::format(fmt::runtime(info.filename_format), getpid());
291398
info = {};
292399

293-
unlink(child_filename.c_str());
294-
295-
if (_PyJIT_IsEnabled()) {
296-
// The JIT is still enabled: copy the file to allow for more compilation in
297-
// this process.
298-
if (auto new_pid_map = copyFile(parent_filename, child_filename)) {
299-
info.filename = child_filename;
300-
info.file = new_pid_map;
400+
if (parent_filename.starts_with("/tmp/perf-") &&
401+
parent_filename.ends_with(".map") && isPerfTrampolineActive()) {
402+
if (!copyJitEntries(parent_filename)) {
403+
JIT_LOG(
404+
"Failed to copy JIT entries from %s to %s",
405+
parent_filename,
406+
child_filename);
301407
}
302-
} else {
303-
// The JIT has been disabled: hard link the file to save disk space. Don't
304-
// open it in this process, to avoid messing with the parent's file.
305-
if (::link(parent_filename.c_str(), child_filename.c_str()) != 0) {
408+
} else if (
409+
parent_filename.starts_with("/tmp/perf-") &&
410+
parent_filename.ends_with(".map") && _PyJIT_IsEnabled()) {
411+
// The JIT is still enabled: copy the file to allow for more compilation
412+
// in this process.
413+
if (!copyJitFile(parent_filename)) {
306414
JIT_LOG(
307-
"Failed to link %s to %s: %s",
308-
child_filename,
415+
"Failed to copy perf map file from %s to %s",
309416
parent_filename,
310-
string_error(errno));
417+
child_filename);
418+
}
419+
} else {
420+
unlink(child_filename.c_str());
421+
if (_PyJIT_IsEnabled()) {
422+
// The JIT is still enabled: copy the file to allow for more compilation
423+
// in this process.
424+
if (auto new_pid_map = copyFile(parent_filename, child_filename)) {
425+
info.filename = child_filename;
426+
info.file = new_pid_map;
427+
}
311428
} else {
312-
// Poke the file's atime to keep tmpwatch at bay.
313-
std::FILE* file = std::fopen(parent_filename.c_str(), "r");
314-
if (file != nullptr) {
315-
std::fclose(file);
429+
// The JIT has been disabled: hard link the file to save disk space. Don't
430+
// open it in this process, to avoid messing with the parent's file.
431+
if (::link(parent_filename.c_str(), child_filename.c_str()) != 0) {
432+
JIT_LOG(
433+
"Failed to link %s to %s: %s",
434+
child_filename,
435+
parent_filename,
436+
string_error(errno));
437+
} else {
438+
// Poke the file's atime to keep tmpwatch at bay.
439+
std::FILE* file = std::fopen(parent_filename.c_str(), "r");
440+
if (file != nullptr) {
441+
std::fclose(file);
442+
}
316443
}
444+
info.file = nullptr;
445+
info.filename = "";
317446
}
318-
info.file = nullptr;
319-
info.filename = "";
320447
}
321448
}
322449

@@ -353,19 +480,12 @@ void registerFunction(
353480

354481
initFiles();
355482

356-
if (auto file = g_pid_map.file) {
357-
for (auto& section_and_size : code_sections) {
358-
void* code = section_and_size.first;
359-
std::size_t size = section_and_size.second;
360-
fmt::print(
361-
file,
362-
"{:x} {:x} {}:{}\n",
363-
reinterpret_cast<uintptr_t>(code),
364-
size,
365-
prefix,
366-
name);
367-
std::fflush(file);
368-
}
483+
for (auto& section_and_size : code_sections) {
484+
void* code = section_and_size.first;
485+
std::size_t size = section_and_size.second;
486+
auto jit_entry = prefix + ":" + name;
487+
PyUnstable_WritePerfMapEntry(
488+
static_cast<const void*>(code), size, jit_entry.c_str());
369489
}
370490

371491
if (auto file = g_jitdump_file.file) {

Lib/test/test_perfmaps.py

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
import os
2+
import sys
3+
import unittest
4+
5+
from _testinternalcapi import perf_map_state_teardown, write_perf_map_entry
6+
7+
if sys.platform != 'linux':
8+
raise unittest.SkipTest('Linux only')
9+
10+
11+
class TestPerfMapWriting(unittest.TestCase):
12+
def test_write_perf_map_entry(self):
13+
self.assertEqual(write_perf_map_entry(0x1234, 5678, "entry1"), 0)
14+
self.assertEqual(write_perf_map_entry(0x2345, 6789, "entry2"), 0)
15+
with open(f"/tmp/perf-{os.getpid()}.map") as f:
16+
perf_file_contents = f.read()
17+
self.assertIn("1234 162e entry1", perf_file_contents)
18+
self.assertIn("2345 1a85 entry2", perf_file_contents)
19+
perf_map_state_teardown()
Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
Introduced :c:func:`PyUnstable_WritePerfMapEntry`, :c:func:`PyUnstable_PerfMapState_Init` and
2+
:c:func:`PyUnstable_PerfMapState_Fini`. These allow extension modules (JIT compilers in
3+
particular) to write to perf-map files in a thread safe manner. The
4+
:doc:`../howto/perf_profiling` also uses these APIs to write
5+
entries in the perf-map file.

Modules/_testinternalcapi.c

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -410,6 +410,30 @@ test_gc_visit_objects(PyObject *Py_UNUSED(self), PyObject *Py_UNUSED(ignored)) {
410410
Py_RETURN_NONE;
411411
}
412412

413+
static PyObject *
414+
write_perf_map_entry(PyObject *self, PyObject *args)
415+
{
416+
const void *code_addr;
417+
unsigned int code_size;
418+
const char *entry_name;
419+
420+
if (!PyArg_ParseTuple(args, "KIs", &code_addr, &code_size, &entry_name))
421+
return NULL;
422+
423+
int ret = PyUnstable_WritePerfMapEntry(code_addr, code_size, entry_name);
424+
if (ret == -1) {
425+
PyErr_SetString(PyExc_OSError, "Failed to write performance map entry");
426+
return NULL;
427+
}
428+
return Py_BuildValue("i", ret);
429+
}
430+
431+
static PyObject *
432+
perf_map_state_teardown(PyObject *Py_UNUSED(self), PyObject *Py_UNUSED(ignored))
433+
{
434+
PyUnstable_PerfMapState_Fini();
435+
Py_RETURN_NONE;
436+
}
413437

414438
// These are used in native calling tests, ensure the compiler
415439
// doesn't hide or remove these symbols
@@ -438,6 +462,8 @@ static PyMethodDef TestMethods[] = {
438462
{"test_atomic_funcs", test_atomic_funcs, METH_NOARGS},
439463
{"test_edit_cost", test_edit_cost, METH_NOARGS},
440464
{"test_gc_visit_objects", test_gc_visit_objects, METH_NOARGS},
465+
{"write_perf_map_entry", write_perf_map_entry, METH_VARARGS},
466+
{"perf_map_state_teardown", perf_map_state_teardown, METH_NOARGS},
441467
{NULL, NULL} /* sentinel */
442468
};
443469

0 commit comments

Comments
 (0)