Skip to content

bpo-46315: Use fopencookie() to avoid dup() in _PyTokenizer_FindEncodingFilename #32033

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Mar 22, 2022
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
``_PyTokenizer_FindEncodingFilename`` now uses ``fdopencookie`` to avoid
``dup`` on Emscripten and WASI.
80 changes: 74 additions & 6 deletions Parser/tokenizer.c
Original file line number Diff line number Diff line change
Expand Up @@ -2072,6 +2072,79 @@ _PyTokenizer_Get(struct tok_state *tok,
return result;
}

#if defined(__wasi__) || defined(__EMSCRIPTEN__)
/* fdopen() with borrowed fd

WASI does not provide dup() and Emscripten's dup() emulation with open()
is slow. Implement fdopen() with fd borrowing on top of fdopencookie().
*/
typedef union {
void *cookie;
int fd;
} borrowed;

static ssize_t
borrow_read(void *cookie, char *buf, size_t size)
{
borrowed b;
b.cookie = cookie;
return read(b.fd, (void *)buf, size);
}

static ssize_t
borrow_write(void *cookie, const char *buf, size_t size)
{
errno = ENOTSUP;
return -1;
}

static int
borrow_seek(void *cookie, off_t *off, int whence)
{
borrowed b;
b.cookie = cookie;
off_t pos;
pos = lseek(b.fd, *off, whence);
if (pos == (off_t)-1) {
return -1;
} else {
*off = pos;
return 0;
}
}

static int
borrow_close(void *cookie)
{
// does not close(fd)
return 0;
}

static FILE *
fdopen_borrow(int fd, const char *mode) {
// only reading is supported
if (strcmp(mode, "r") != 0) {
return NULL;
}
cookie_io_functions_t cookie_io = {
borrow_read, borrow_write, borrow_seek, borrow_close
};
// cookie is just the fd
borrowed b;
b.fd = fd;
return fopencookie(b.cookie, "r", cookie_io);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fopencookie is a non-standard GNU extension. Is this available in all platforms targetted by web assembly? Does this need a configure check?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a configure check in my first version of the patch. I got rid of it because it makes no sense. Both WASI and Emscripten use musl libc as upper half of libc and it always provides fdopencookie.

I wish their was a more elegant way to solve the problem. _PyTokenizer_FindEncodingFilename uses a FILE pointer instead of a file descriptor.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ugh, I am honestly not very happy with the amount of extra code, but I understand why is needed.

}
#else
static FILE *
fdopen_borrow(int fd, const char *mode) {
fd = _Py_dup(fd);
if (fd < 0) {
return NULL;
}
return fdopen(fd, mode);
}
#endif

/* Get the encoding of a Python file. Check for the coding cookie and check if
the file starts with a BOM.

Expand All @@ -2091,12 +2164,7 @@ _PyTokenizer_FindEncodingFilename(int fd, PyObject *filename)
const char *p_end = NULL;
char *encoding = NULL;

fd = _Py_dup(fd);
if (fd < 0) {
return NULL;
}

fp = fdopen(fd, "r");
fp = fdopen_borrow(fd, "r");
if (fp == NULL) {
return NULL;
}
Expand Down