Skip to content

Commit 2094a00

Browse files
DOCSP-43242: Improve UTF-8 validation documentation to clarify validation occurs on decoded data only (#908)
* DOCSP-43242: updating paragraph * committed incorrect change * updating note * changing to documents * remove note altogether * adding original note back * flow + bson strings * more specific string distinction * generalizing data * added warning and lone surrogate information * removing * Update source/fundamentals/bson/utf8-validation.txt Co-authored-by: Jordan Smith <[email protected]> --------- Co-authored-by: Jordan Smith <[email protected]>
1 parent 1cb8caf commit 2094a00

File tree

1 file changed

+7
-6
lines changed

1 file changed

+7
-6
lines changed

source/fundamentals/bson/utf8-validation.txt

Lines changed: 7 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -25,15 +25,16 @@ processing overhead since it needs to check the data.
2525
If you *disable* validation, your application avoids the validation processing
2626
overhead, but cannot guarantee consistent presentation of invalid UTF-8 data.
2727

28-
The driver enables UTF-8 validation by default. It checks documents for any
29-
characters that are not encoded in a valid UTF-8 format when it transfers data
30-
between your application and MongoDB.
28+
By default, the driver enables UTF-8 validation on data from MongoDB.
29+
It checks incoming documents for any characters that are not encoded in a
30+
valid UTF-8 format when it parses data sent from MongoDB to your application.
3131

3232
.. note::
3333

34-
The current version of the {+driver-short+} automatically substitutes
35-
invalid UTF-8 characters with alternate valid UTF-8 ones before
36-
validation when you send data to MongoDB. Therefore, the validation
34+
This version of the {+driver-short+} automatically substitutes invalid
35+
`lone surrogates <https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String#utf-16_characters_unicode_code_points_and_grapheme_clusters>`__
36+
with the `replacement character <https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/toWellFormed>`__
37+
before validation when you send data to MongoDB. Therefore, the validation
3738
only throws an error when the setting is enabled and the driver
3839
receives invalid UTF-8 document data from MongoDB.
3940

0 commit comments

Comments
 (0)