Skip to content

ENH: add option to save json without escaping forward slashes #61442

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
1 of 3 tasks
ellisbrown opened this issue May 14, 2025 · 0 comments
Open
1 of 3 tasks

ENH: add option to save json without escaping forward slashes #61442

ellisbrown opened this issue May 14, 2025 · 0 comments
Labels
Enhancement Needs Triage Issue that has not been reviewed by a pandas team member

Comments

@ellisbrown
Copy link

ellisbrown commented May 14, 2025

Feature Type

  • Adding new functionality to pandas

  • Changing existing functionality in pandas

  • Removing existing functionality in pandas

Problem Description

I love pandas and use it extensively. one very common use case for me is saving large json / jsonl files to describe ML training datasets. unfortunately, pandas uses ujson under the hood which automatically escapes forward slashes---which are a very common use case in my dataset files to describe filepaths to images/videos/etc.

the escaped filepaths hit issues with some (non-pandas) downstream libs that ingest my json/jsonl dataset files. so instead of using of using the native pandas .to_json() function, I have to import the json package and manually write the file myself. this can be much slower for very large files

I am ok living with this inconvenience, but it seems to me to be a gap in the pandas api. perhaps adding an option to prevent the escaping could would be a good enhancement

Feature Description

add a new parameter to pandas.DataFrame.to_json() to escape_forward_slashes

def to_json(self, ..., escape_forward_slashes=True) -> str | None:
    ...

or even a ujson_options dict

def to_json(self, ..., ujson_options={}) -> str | None:
    ...

Alternative Solutions

instead of

df.to_json(path)

you have to manually use the json package

import json

with open(path, "w") as f:
    json.dump(df.to_dict(orient="records"), f)

Additional Context

also note that the ujson project explicitly states

this library has been put into a maintenance-only mode... Users are encouraged to migrate to orjson which is both much faster and less likely to introduce a surprise buffer overflow vulnerability in the future.

so it might be worth migrating to orjson during this development effort

@ellisbrown ellisbrown added Enhancement Needs Triage Issue that has not been reviewed by a pandas team member labels May 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement Needs Triage Issue that has not been reviewed by a pandas team member
Projects
None yet
Development

No branches or pull requests

1 participant