encoding/gob: warn about decoding data from untrusted sources

bradfitz · bradfitz · commit 19f73a786bbd · 2017-06-29T03:24:29.000Z
And some double space after period cleanup while I'm here. I guess my previous regexps missed these. My next cleaner should probably use go/ast instead of perl. Updates #20221 Change-Id: Idb051e7ac3a7fb1fb86e015f709e32139d065d92 Reviewed-on: https://go-review.googlesource.com/47094 Reviewed-by: Ian Lance Taylor <iant@golang.org> Reviewed-by: Rob Pike <r@golang.org>
diff --git a/src/encoding/gob/decoder.go b/src/encoding/gob/decoder.go
@@ -19,6 +19,10 @@ const tooBig = 1 << 30
 
 // A Decoder manages the receipt of type and data information read from the
 // remote side of a connection.
+//
+// The Decoder does only basic sanity checking on decoded input sizes,
+// and its limits are not configurable. Take caution when decoding gob data
+// from untrusted sources.
 type Decoder struct {
 	mutex        sync.Mutex                              // each item must be received atomically
 	r            io.Reader                               // source of the data
diff --git a/src/encoding/gob/doc.go b/src/encoding/gob/doc.go
@@ -4,7 +4,7 @@
 
 /*
 Package gob manages streams of gobs - binary values exchanged between an
-Encoder (transmitter) and a Decoder (receiver).  A typical use is transporting
+Encoder (transmitter) and a Decoder (receiver). A typical use is transporting
 arguments and results of remote procedure calls (RPCs) such as those provided by
 package "net/rpc".
 
@@ -14,28 +14,28 @@ amortizing the cost of compilation.
 
 Basics
 
-A stream of gobs is self-describing.  Each data item in the stream is preceded by
+A stream of gobs is self-describing. Each data item in the stream is preceded by
 a specification of its type, expressed in terms of a small set of predefined
-types.  Pointers are not transmitted, but the things they point to are
+types. Pointers are not transmitted, but the things they point to are
 transmitted; that is, the values are flattened. Nil pointers are not permitted,
 as they have no value. Recursive types work fine, but
-recursive values (data with cycles) are problematic.  This may change.
+recursive values (data with cycles) are problematic. This may change.
 
 To use gobs, create an Encoder and present it with a series of data items as
-values or addresses that can be dereferenced to values.  The Encoder makes sure
-all type information is sent before it is needed.  At the receive side, a
+values or addresses that can be dereferenced to values. The Encoder makes sure
+all type information is sent before it is needed. At the receive side, a
 Decoder retrieves values from the encoded stream and unpacks them into local
 variables.
 
 Types and Values
 
-The source and destination values/types need not correspond exactly.  For structs,
+The source and destination values/types need not correspond exactly. For structs,
 fields (identified by name) that are in the source but absent from the receiving
-variable will be ignored.  Fields that are in the receiving variable but missing
-from the transmitted type or value will be ignored in the destination.  If a field
+variable will be ignored. Fields that are in the receiving variable but missing
+from the transmitted type or value will be ignored in the destination. If a field
 with the same name is present in both, their types must be compatible. Both the
 receiver and transmitter will do all necessary indirection and dereferencing to
-convert between gobs and actual Go values.  For instance, a gob type that is
+convert between gobs and actual Go values. For instance, a gob type that is
 schematically,
 
 	struct { A, B int }
@@ -63,16 +63,16 @@ Attempting to receive into these types will draw a decode error:
 	struct { C, D int }		// no field names in common
 
 Integers are transmitted two ways: arbitrary precision signed integers or
-arbitrary precision unsigned integers.  There is no int8, int16 etc.
-discrimination in the gob format; there are only signed and unsigned integers.  As
+arbitrary precision unsigned integers. There is no int8, int16 etc.
+discrimination in the gob format; there are only signed and unsigned integers. As
 described below, the transmitter sends the value in a variable-length encoding;
 the receiver accepts the value and stores it in the destination variable.
 Floating-point numbers are always sent using IEEE-754 64-bit precision (see
 below).
 
 Signed integers may be received into any signed integer variable: int, int16, etc.;
 unsigned integers may be received into any unsigned integer variable; and floating
-point values may be received into any floating point variable.  However,
+point values may be received into any floating point variable. However,
 the destination variable must be able to represent the value or the decode
 operation will fail.
 
@@ -106,17 +106,17 @@ Encoding Details
 This section documents the encoding, details that are not important for most
 users. Details are presented bottom-up.
 
-An unsigned integer is sent one of two ways.  If it is less than 128, it is sent
-as a byte with that value.  Otherwise it is sent as a minimal-length big-endian
+An unsigned integer is sent one of two ways. If it is less than 128, it is sent
+as a byte with that value. Otherwise it is sent as a minimal-length big-endian
 (high byte first) byte stream holding the value, preceded by one byte holding the
-byte count, negated.  Thus 0 is transmitted as (00), 7 is transmitted as (07) and
+byte count, negated. Thus 0 is transmitted as (00), 7 is transmitted as (07) and
 256 is transmitted as (FE 01 00).
 
 A boolean is encoded within an unsigned integer: 0 for false, 1 for true.
 
-A signed integer, i, is encoded within an unsigned integer, u.  Within u, bits 1
+A signed integer, i, is encoded within an unsigned integer, u. Within u, bits 1
 upward contain the value; bit 0 says whether they should be complemented upon
-receipt.  The encode algorithm looks like this:
+receipt. The encode algorithm looks like this:
 
 	var u uint
 	if i < 0 {
@@ -127,14 +127,14 @@ receipt.  The encode algorithm looks like this:
 	encodeUnsigned(u)
 
 The low bit is therefore analogous to a sign bit, but making it the complement bit
-instead guarantees that the largest negative integer is not a special case.  For
+instead guarantees that the largest negative integer is not a special case. For
 example, -129=^128=(^256>>1) encodes as (FE 01 01).
 
 Floating-point numbers are always sent as a representation of a float64 value.
-That value is converted to a uint64 using math.Float64bits.  The uint64 is then
-byte-reversed and sent as a regular unsigned integer.  The byte-reversal means the
-exponent and high-precision part of the mantissa go first.  Since the low bits are
-often zero, this can save encoding bytes.  For instance, 17.0 is encoded in only
+That value is converted to a uint64 using math.Float64bits. The uint64 is then
+byte-reversed and sent as a regular unsigned integer. The byte-reversal means the
+exponent and high-precision part of the mantissa go first. Since the low bits are
+often zero, this can save encoding bytes. For instance, 17.0 is encoded in only
 three bytes (FE 31 40).
 
 Strings and slices of bytes are sent as an unsigned count followed by that many
@@ -151,27 +151,27 @@ is nil and not at the top level.
 In slices and arrays, as well as maps, all elements, even zero-valued elements,
 are transmitted, even if all the elements are zero.
 
-Structs are sent as a sequence of (field number, field value) pairs.  The field
-value is sent using the standard gob encoding for its type, recursively.  If a
+Structs are sent as a sequence of (field number, field value) pairs. The field
+value is sent using the standard gob encoding for its type, recursively. If a
 field has the zero value for its type (except for arrays; see above), it is omitted
-from the transmission.  The field number is defined by the type of the encoded
+from the transmission. The field number is defined by the type of the encoded
 struct: the first field of the encoded type is field 0, the second is field 1,
-etc.  When encoding a value, the field numbers are delta encoded for efficiency
+etc. When encoding a value, the field numbers are delta encoded for efficiency
 and the fields are always sent in order of increasing field number; the deltas are
-therefore unsigned.  The initialization for the delta encoding sets the field
+therefore unsigned. The initialization for the delta encoding sets the field
 number to -1, so an unsigned integer field 0 with value 7 is transmitted as unsigned
-delta = 1, unsigned value = 7 or (01 07).  Finally, after all the fields have been
-sent a terminating mark denotes the end of the struct.  That mark is a delta=0
+delta = 1, unsigned value = 7 or (01 07). Finally, after all the fields have been
+sent a terminating mark denotes the end of the struct. That mark is a delta=0
 value, which has representation (00).
 
 Interface types are not checked for compatibility; all interface types are
 treated, for transmission, as members of a single "interface" type, analogous to
-int or []byte - in effect they're all treated as interface{}.  Interface values
+int or []byte - in effect they're all treated as interface{}. Interface values
 are transmitted as a string identifying the concrete type being sent (a name
 that must be pre-defined by calling Register), followed by a byte count of the
 length of the following data (so the value can be skipped if it cannot be
 stored), followed by the usual encoding of concrete (dynamic) value stored in
-the interface value.  (A nil interface value is identified by the empty string
+the interface value. (A nil interface value is identified by the empty string
 and transmits no value.) Upon receipt, the decoder verifies that the unpacked
 concrete item satisfies the interface of the receiving variable.
 
@@ -181,9 +181,9 @@ The only visible effect of this is to encode a zero byte after the value, just a
 after the last field of an encoded struct, so that the decode algorithm knows when
 the top-level value is complete.
 
-The representation of types is described below.  When a type is defined on a given
+The representation of types is described below. When a type is defined on a given
 connection between an Encoder and Decoder, it is assigned a signed integer type
-id.  When Encoder.Encode(v) is called, it makes sure there is an id assigned for
+id. When Encoder.Encode(v) is called, it makes sure there is an id assigned for
 the type of v and all its elements and then it sends the pair (typeid, encoded-v)
 where typeid is the type id of the encoded type of v and encoded-v is the gob
 encoding of the value v.
@@ -229,7 +229,7 @@ If there are nested type ids, the types for all inner type ids must be defined
 before the top-level type id is used to describe an encoded-v.
 
 For simplicity in setup, the connection is defined to understand these types a
-priori, as well as the basic gob types int, uint, etc.  Their ids are:
+priori, as well as the basic gob types int, uint, etc. Their ids are:
 
 	bool        1
 	int         2
@@ -250,7 +250,7 @@ priori, as well as the basic gob types int, uint, etc.  Their ids are:
 	MapType     23
 
 Finally, each message created by a call to Encode is preceded by an encoded
-unsigned integer count of the number of bytes remaining in the message.  After
+unsigned integer count of the number of bytes remaining in the message. After
 the initial type name, interface values are wrapped the same way; in effect, the
 interface value acts like a recursive invocation of Encode.
 
@@ -262,7 +262,7 @@ where * signifies zero or more repetitions and the type id of a value must
 be predefined or be defined before the value in the stream.
 
 Compatibility: Any future changes to the package will endeavor to maintain
-compatibility with streams encoded using previous versions.  That is, any released
+compatibility with streams encoded using previous versions. That is, any released
 version of this package should be able to decode data written with any previously
 released version, subject to issues such as security fixes. See the Go compatibility
 document for background: https://golang.org/doc/go1compat
@@ -321,7 +321,7 @@ StructValue:
 */
 
 /*
-For implementers and the curious, here is an encoded example.  Given
+For implementers and the curious, here is an encoded example. Given
 	type Point struct {X, Y int}
 and the value
 	p := Point{22, 33}
@@ -332,14 +332,14 @@ the bytes transmitted that encode p will be:
 They are determined as follows.
 
 Since this is the first transmission of type Point, the type descriptor
-for Point itself must be sent before the value.  This is the first type
+for Point itself must be sent before the value. This is the first type
 we've sent on this Encoder, so it has type id 65 (0 through 64 are
 reserved).
 
 	1f	// This item (a type descriptor) is 31 bytes long.
 	ff 81	// The negative of the id for the type we're defining, -65.
 		// This is one byte (indicated by FF = -1) followed by
-		// ^-65<<1 | 1.  The low 1 bit signals to complement the
+		// ^-65<<1 | 1. The low 1 bit signals to complement the
 		// rest upon receipt.
 
 	// Now we send a type descriptor, which is itself a struct (wireType).
@@ -376,7 +376,7 @@ reserved).
 	00	// end of wireType.structType structure
 	00	// end of wireType structure
 
-Now we can send the Point value.  Again the field number resets to -1:
+Now we can send the Point value. Again the field number resets to -1:
 
 	07	// this value is 7 bytes long
 	ff 82	// the type number, 65 (1 byte (-FF) followed by 65<<1)
@@ -393,7 +393,7 @@ output will be just:
 	07 ff 82 01 2c 01 42 00
 
 A single non-struct value at top level is transmitted like a field with
-delta tag 0.  For instance, a signed integer with value 3 presented as
+delta tag 0. For instance, a signed integer with value 3 presented as
 the argument to Encode will emit:
 
 	03 04 00 06