Skip to content

Fixed control chars escaping with enabled emitUTF8 #1176

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 4 commits into from
Closed
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions src/lib_json/json_tool.h
Original file line number Diff line number Diff line change
Expand Up @@ -64,6 +64,9 @@ static inline String codePointToUTF8(unsigned int cp) {
return result;
}

/// Returns true if ch is a control character (in range [1,31]).
static inline bool isControlCharacter(char ch) { return ch > 0 && ch <= 0x1F; }

enum {
/// Constant that specify the size of the buffer that must be passed to
/// uintToString.
Expand Down
7 changes: 6 additions & 1 deletion src/lib_json/json_writer.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -310,7 +310,12 @@ static String valueToQuotedStringN(const char* value, unsigned length,
// sequence from occurring.
default: {
if (emitUTF8) {
result += *c;
if (isControlCharacter(*c)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch.
But I think there is no need to add more method.
You can refer these code : https://github.com/nlohmann/json/blob/a861bdb81f1c5ef0adc270952001096b189dbe9e/include/nlohmann/detail/output/serializer.hpp#L406-#L428

emitUTF maybe not a good name.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the code you are referring I guess you mean that I should embed that code as part of the current one which is doing the utf8 encoding?

And yeah, encodeUTF8 would be much better name, but regardless of the encoding, control and special characters should be encoded/escaped.

Copy link
Contributor Author

@TheStormN TheStormN May 20, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I've squeezed it a bit, but utf8ToCodepoint should not be called when we don't want to do special processing for utf-8, because it offsets the pointer.

result += "\\u";
result += toHex16Bit(*c);
} else {
result += *c;
}
} else {
unsigned int codepoint = utf8ToCodepoint(c, end);
const unsigned int FIRST_NON_CONTROL_CODEPOINT = 0x20;
Expand Down
24 changes: 24 additions & 0 deletions src/test_lib_json/main.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -2640,6 +2640,30 @@ JSONTEST_FIXTURE_LOCAL(StreamWriterTest, unicode) {
"\"\\t\\n\\ud806\\udca1=\\u0133\\ud82c\\udd1b\\uff67\"\n}");
}

JSONTEST_FIXTURE_LOCAL(StreamWriterTest, controlChars) {
// Create a Json value containing UTF-8 string with some chars that need
// escape (tab,newline, control chars).
const Json::String expected(
"{\n\t\"test\" : "
"\"\\u0001\\u0002\\u0003\\u0004\\u0005\\u0006\\u0007\\b\\t\\n\\u000b\\f\\"
"r\\u000e\\u000f\\u0010\\u0011\\u0012\\u0013\\u0014\\u0015\\u0016\\u0017"
"\\u0018\\u0019\\u001a\\u001b\\u001c\\u001d\\u001e\\u001f\"\n}");

// Create a Json value containing string with controls chars that need escape.
Json::Value root;
root["test"] = "\x1\x2\x3\x4\x5\x6\x7\b\t\n\xB\f\r\xE\xF\x10\x11\x12\x13\x14"
"\x15\x16\x17\x18\x19\x1A\x1B\x1C\x1D\x1E\x1F";

Json::StreamWriterBuilder b;
JSONTEST_ASSERT(Json::writeString(b, root) == expected);

b.settings_["emitUTF8"] = true;
JSONTEST_ASSERT(Json::writeString(b, root) == expected);

b.settings_["emitUTF8"] = false;
JSONTEST_ASSERT(Json::writeString(b, root) == expected);
}

struct ReaderTest : JsonTest::TestCase {
void setStrictMode() {
reader = std::unique_ptr<Json::Reader>(
Expand Down