Skip to content

csv: remove old workaround in UTF8Reader #546

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Mar 12, 2025

Conversation

pjfanning
Copy link
Member

@@ -403,19 +403,6 @@ private boolean loadMore(int available) throws IOException
// Bytes that need to be moved to the beginning of buffer?
if (available > 0) {
if (_inputPtr > 0) {
if (!canModifyBuffer()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you feel this is no longer needed? This is to prevent reader from modifying buffer caller gave, in case moving would occur.

If anything, looks like it should avoid copying even if _inputSource was not null.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see. It'd solve #497?

We do need to avoid modifying buffer, however, I don't think this PR quite works in that sense.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test pass without this check. The existing tests and we gain an extra working test (the 497 test).

I don't know how to rewrite this because I don't have a broken test to work with. My main aim is not to have the 497 test (which is a valid test) fail. I'm not against revisiting this but since this change breaks no tests (other than minor woring diff in exception message on 1 test) and fixes one broken test, I'd prefer to remove this check for now. If this was merged, I would follow up and remove the same check in the TOML and YAML code bases because there is no proof that this check helps there either.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be nice to have a test to verify underlying buffer is not modified when it shouldn't be -- which is what check does -- but even without that we should not make change that will now unexpectedly do that.

Specifically this is the case where caller passes byte[] containing UTF-8 content: expectation is that this buffer is not modified. And as such it should not be.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • ok, so I extended the tests to see if array input gets modified and it does
  • is there anyway that the parser could be modified to clone the array when we hit this case and to work with the cloned array - to avoid modifying the input array?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added back the code check for whether the buffer is modifiable - but instead of throwing an exception, I clone the buffer to avoid modifying the original. The cloning only happens in rare cases but I think it is worth the runtime hit to fix the broken test.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went ahead and used the idea, just rewrote slightly. Thank you for test to verify buffer not modified!

Ideally wouldn't need to move things around, will see what I can do there. But at least I can go ahead and merge this now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, can just remove copying action altogether.

@pjfanning pjfanning marked this pull request as draft March 7, 2025 12:20
@pjfanning pjfanning marked this pull request as ready for review March 7, 2025 12:48
* underlying buffer. This is the case if we are managing the buffer,
* but not if it was just given to us.
*/
protected final boolean canModifyBuffer() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although protected, not usable from outside, so can just remove.


/**
* Collection of OSS-Fuzz found issues for CSV format module.
*/
public class FuzzCSVReadTest extends StreamingCSVReadTest
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure why base class was chosen that way; was wrong. :)

@cowtowncoder cowtowncoder merged commit b3d13a7 into FasterXML:2.19 Mar 12, 2025
4 checks passed
@pjfanning pjfanning deleted the csv-fuzz branch March 12, 2025 22:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants