-
-
Notifications
You must be signed in to change notification settings - Fork 150
csv: remove old workaround in UTF8Reader #546
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
4ca624a
274fa48
5bff1be
deda3dd
cec1c69
45ffa49
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -25,6 +25,16 @@ public final class UTF8Reader | |
|
||
private byte[] _inputBuffer; | ||
|
||
/** | ||
* Flag set to indicate {@code inputBuffer} is read-only, and its | ||
* content should not be modified. This is the case when caller | ||
* has passed in a buffer of contents already read, instead of Jackson | ||
* allocating read buffer. | ||
* | ||
* @since 2.19 | ||
*/ | ||
private final boolean _inputBufferReadOnly; | ||
|
||
/** | ||
* Pointer to the next available byte (if any), iff less than | ||
* <code>mByteBufferEnd</code> | ||
|
@@ -73,7 +83,10 @@ public UTF8Reader(IOContext ctxt, InputStream in, boolean autoClose, | |
_inputBuffer = buf; | ||
_inputPtr = ptr; | ||
_inputEnd = ptr+len; | ||
_autoClose = autoClose; | ||
_autoClose = autoClose; | ||
// Unmodifiable if there is no stream to actually read from | ||
// (ideally caller should pass explicitly) | ||
_inputBufferReadOnly = (in == null); | ||
} | ||
|
||
public UTF8Reader(IOContext ctxt, byte[] buf, int ptr, int len) | ||
|
@@ -85,6 +98,8 @@ public UTF8Reader(IOContext ctxt, byte[] buf, int ptr, int len) | |
_inputPtr = ptr; | ||
_inputEnd = ptr+len; | ||
_autoClose = true; | ||
// This is the case when we have a buffer of contents already read | ||
_inputBufferReadOnly = true; | ||
} | ||
|
||
public UTF8Reader(IOContext ctxt, InputStream in, boolean autoClose) | ||
|
@@ -96,15 +111,8 @@ public UTF8Reader(IOContext ctxt, InputStream in, boolean autoClose) | |
_inputPtr = 0; | ||
_inputEnd = 0; | ||
_autoClose = autoClose; | ||
} | ||
|
||
/** | ||
* Method that can be used to see if we can actually modify the | ||
* underlying buffer. This is the case if we are managing the buffer, | ||
* but not if it was just given to us. | ||
*/ | ||
protected final boolean canModifyBuffer() { | ||
return (_ioContext != null); | ||
// Buffer allocated above, modifiable as needed | ||
_inputBufferReadOnly = false; | ||
} | ||
|
||
/* | ||
|
@@ -400,27 +408,17 @@ private boolean loadMore(int available) throws IOException | |
{ | ||
_byteCount += (_inputEnd - available); | ||
|
||
// Bytes that need to be moved to the beginning of buffer? | ||
if (available > 0) { | ||
// Should we move bytes to the beginning of buffer? | ||
if (_inputPtr > 0) { | ||
if (!canModifyBuffer()) { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why do you feel this is no longer needed? This is to prevent reader from modifying buffer caller gave, in case moving would occur. If anything, looks like it should avoid copying even if There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Oh, I see. It'd solve #497? We do need to avoid modifying buffer, however, I don't think this PR quite works in that sense. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The test pass without this check. The existing tests and we gain an extra working test (the 497 test). I don't know how to rewrite this because I don't have a broken test to work with. My main aim is not to have the 497 test (which is a valid test) fail. I'm not against revisiting this but since this change breaks no tests (other than minor woring diff in exception message on 1 test) and fixes one broken test, I'd prefer to remove this check for now. If this was merged, I would follow up and remove the same check in the TOML and YAML code bases because there is no proof that this check helps there either. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It'd be nice to have a test to verify underlying buffer is not modified when it shouldn't be -- which is what check does -- but even without that we should not make change that will now unexpectedly do that. Specifically this is the case where caller passes There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I added back the code check for whether the buffer is modifiable - but instead of throwing an exception, I clone the buffer to avoid modifying the original. The cloning only happens in rare cases but I think it is worth the runtime hit to fix the broken test. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I went ahead and used the idea, just rewrote slightly. Thank you for test to verify buffer not modified! Ideally wouldn't need to move things around, will see what I can do there. But at least I can go ahead and merge this now. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Actually, can just remove copying action altogether. |
||
// 15-Aug-2022, tatu: Occurs (only) if we have half-decoded UTF-8 | ||
// characters; uncovered by: | ||
// | ||
// https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=50036 | ||
// | ||
// and need to be reported as IOException | ||
if (_inputSource == null) { | ||
throw new IOException(String.format( | ||
"End-of-input after first %d byte(s) of a UTF-8 character: needed at least one more", | ||
available)); | ||
// Can only do so if buffer mutable | ||
if (!_inputBufferReadOnly) { | ||
for (int i = 0; i < available; ++i) { | ||
_inputBuffer[i] = _inputBuffer[_inputPtr+i]; | ||
} | ||
_inputPtr = 0; | ||
_inputEnd = available; | ||
} | ||
for (int i = 0; i < available; ++i) { | ||
_inputBuffer[i] = _inputBuffer[_inputPtr+i]; | ||
} | ||
_inputPtr = 0; | ||
_inputEnd = available; | ||
} | ||
} else { | ||
// Ok; here we can actually reasonably expect an EOF, so let's do a separate read right away: | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -6,16 +6,19 @@ | |
import org.junit.jupiter.api.Test; | ||
|
||
import com.fasterxml.jackson.dataformat.csv.CsvMapper; | ||
import com.fasterxml.jackson.dataformat.csv.ModuleTestBase; | ||
|
||
import static org.assertj.core.api.Assertions.fail; | ||
import static org.junit.jupiter.api.Assertions.assertArrayEquals; | ||
|
||
/** | ||
* Collection of OSS-Fuzz found issues for CSV format module. | ||
*/ | ||
public class FuzzCSVReadTest extends StreamingCSVReadTest | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Not sure why base class was chosen that way; was wrong. :) |
||
public class FuzzCSVReadTest extends ModuleTestBase | ||
{ | ||
private final CsvMapper CSV_MAPPER = mapperForCsv(); | ||
private final byte[] INPUT = new byte[] { 0x20, (byte) 0xCD }; | ||
private final byte[] CLONED = INPUT.clone(); | ||
|
||
// https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=50036 | ||
@Test | ||
|
@@ -25,8 +28,9 @@ public void testUTF8Decoding50036() throws Exception | |
CSV_MAPPER.readTree(INPUT); | ||
fail("Should not pass"); | ||
} catch (IOException e) { | ||
verifyException(e, "End-of-input after first 1 byte"); | ||
verifyException(e, "of a UTF-8 character"); | ||
verifyException(e, "Unexpected EOF in the middle of a multi-byte UTF-8 character"); | ||
// check input was not modified | ||
assertArrayEquals(CLONED, INPUT); | ||
} | ||
} | ||
|
||
|
@@ -38,6 +42,8 @@ public void testUTF8Decoding50036Stream() throws Exception | |
fail("Should not pass"); | ||
} catch (IOException e) { | ||
verifyException(e, "Unexpected EOF in the middle of a multi-byte UTF-8 character"); | ||
// check input was not modified | ||
assertArrayEquals(CLONED, INPUT); | ||
} | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although
protected
, not usable from outside, so can just remove.