You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I love pandas and use it extensively. one very common use case for me is saving large json / jsonl files to describe ML training datasets. unfortunately, pandas uses ujson under the hood which automatically escapes forward slashes---which are a very common use case in my dataset files to describe filepaths to images/videos/etc.
the escaped filepaths hit issues with some (non-pandas) downstream libs that ingest my json/jsonl dataset files. so instead of using of using the native pandas .to_json() function, I have to import the json package and manually write the file myself. this can be much slower for very large files
I am ok living with this inconvenience, but it seems to me to be a gap in the pandas api. perhaps adding an option to prevent the escaping could would be a good enhancement
Feature Description
add a new parameter to pandas.DataFrame.to_json() to escape_forward_slashes
this library has been put into a maintenance-only mode... Users are encouraged to migrate to orjson which is both much faster and less likely to introduce a surprise buffer overflow vulnerability in the future.
so it might be worth migrating to orjson during this development effort
The text was updated successfully, but these errors were encountered:
Feature Type
Adding new functionality to pandas
Changing existing functionality in pandas
Removing existing functionality in pandas
Problem Description
I love pandas and use it extensively. one very common use case for me is saving large json / jsonl files to describe ML training datasets. unfortunately, pandas uses ujson under the hood which automatically escapes forward slashes---which are a very common use case in my dataset files to describe filepaths to images/videos/etc.
the escaped filepaths hit issues with some (non-pandas) downstream libs that ingest my json/jsonl dataset files. so instead of using of using the native pandas
.to_json()
function, I have to import thejson
package and manually write the file myself. this can be much slower for very large filesI am ok living with this inconvenience, but it seems to me to be a gap in the pandas api. perhaps adding an option to prevent the escaping could would be a good enhancement
Feature Description
add a new parameter to
pandas.DataFrame.to_json()
toescape_forward_slashes
or even a
ujson_options
dictAlternative Solutions
instead of
you have to manually use the
json
packageAdditional Context
also note that the
ujson
project explicitly statesso it might be worth migrating to
orjson
during this development effortThe text was updated successfully, but these errors were encountered: