Skip to content

support snakeyaml UnicodeReader #540

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
pjfanning opened this issue Feb 25, 2025 · 3 comments
Closed

support snakeyaml UnicodeReader #540

pjfanning opened this issue Feb 25, 2025 · 3 comments

Comments

@pjfanning
Copy link
Member

pjfanning commented Feb 25, 2025

@cowtowncoder
Copy link
Member

I'd be -1 for adding configurability at this point.

The original reason for UTF8Reader use has been performance (esp. for smaller documents). If performance of UnicodeReader was adequate (... I have some concerns since SnakeYAML decoder is rather inefficient) would be ok with change.

But we still need to resolve #497 for CSV no matter what so would ideally target that first.

@pjfanning
Copy link
Member Author

I understand that CSV still has the 497 issue but since the UTF8Reader class is copied into the yaml, csv and toml modules there is a risk that all 3 are still affected in some cases. I might try to set up some jmh testing of the UTFReader against the Java InputStreamReader to see which is best. Ideally, Jackson shouldn't need special UTF8 decoding. We've seen with number writing that Java team sometimes does try to fix their perf issues.

@cowtowncoder
Copy link
Member

True, ideally we would not need plain UTF-8 decoding readers (as opposed to JSON codec where combination of decoding and tokenization is bigger win and makes more sense).
As I recall, the main challenge with JDK one was simply its allocating big, non-reusable decoding buffer.

I'd be bit more cautious about using UTF-8 reader from SnakeYAML fwtw; JDK version less so.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants