Skip to content

Commit 19f73a7

Browse files
committed
encoding/gob: warn about decoding data from untrusted sources
And some double space after period cleanup while I'm here. I guess my previous regexps missed these. My next cleaner should probably use go/ast instead of perl. Updates #20221 Change-Id: Idb051e7ac3a7fb1fb86e015f709e32139d065d92 Reviewed-on: https://go-review.googlesource.com/47094 Reviewed-by: Ian Lance Taylor <[email protected]> Reviewed-by: Rob Pike <[email protected]>
1 parent 8aee0b8 commit 19f73a7

File tree

2 files changed

+46
-42
lines changed

2 files changed

+46
-42
lines changed

src/encoding/gob/decoder.go

+4
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,10 @@ const tooBig = 1 << 30
1919

2020
// A Decoder manages the receipt of type and data information read from the
2121
// remote side of a connection.
22+
//
23+
// The Decoder does only basic sanity checking on decoded input sizes,
24+
// and its limits are not configurable. Take caution when decoding gob data
25+
// from untrusted sources.
2226
type Decoder struct {
2327
mutex sync.Mutex // each item must be received atomically
2428
r io.Reader // source of the data

src/encoding/gob/doc.go

+42-42
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44

55
/*
66
Package gob manages streams of gobs - binary values exchanged between an
7-
Encoder (transmitter) and a Decoder (receiver). A typical use is transporting
7+
Encoder (transmitter) and a Decoder (receiver). A typical use is transporting
88
arguments and results of remote procedure calls (RPCs) such as those provided by
99
package "net/rpc".
1010
@@ -14,28 +14,28 @@ amortizing the cost of compilation.
1414
1515
Basics
1616
17-
A stream of gobs is self-describing. Each data item in the stream is preceded by
17+
A stream of gobs is self-describing. Each data item in the stream is preceded by
1818
a specification of its type, expressed in terms of a small set of predefined
19-
types. Pointers are not transmitted, but the things they point to are
19+
types. Pointers are not transmitted, but the things they point to are
2020
transmitted; that is, the values are flattened. Nil pointers are not permitted,
2121
as they have no value. Recursive types work fine, but
22-
recursive values (data with cycles) are problematic. This may change.
22+
recursive values (data with cycles) are problematic. This may change.
2323
2424
To use gobs, create an Encoder and present it with a series of data items as
25-
values or addresses that can be dereferenced to values. The Encoder makes sure
26-
all type information is sent before it is needed. At the receive side, a
25+
values or addresses that can be dereferenced to values. The Encoder makes sure
26+
all type information is sent before it is needed. At the receive side, a
2727
Decoder retrieves values from the encoded stream and unpacks them into local
2828
variables.
2929
3030
Types and Values
3131
32-
The source and destination values/types need not correspond exactly. For structs,
32+
The source and destination values/types need not correspond exactly. For structs,
3333
fields (identified by name) that are in the source but absent from the receiving
34-
variable will be ignored. Fields that are in the receiving variable but missing
35-
from the transmitted type or value will be ignored in the destination. If a field
34+
variable will be ignored. Fields that are in the receiving variable but missing
35+
from the transmitted type or value will be ignored in the destination. If a field
3636
with the same name is present in both, their types must be compatible. Both the
3737
receiver and transmitter will do all necessary indirection and dereferencing to
38-
convert between gobs and actual Go values. For instance, a gob type that is
38+
convert between gobs and actual Go values. For instance, a gob type that is
3939
schematically,
4040
4141
struct { A, B int }
@@ -63,16 +63,16 @@ Attempting to receive into these types will draw a decode error:
6363
struct { C, D int } // no field names in common
6464
6565
Integers are transmitted two ways: arbitrary precision signed integers or
66-
arbitrary precision unsigned integers. There is no int8, int16 etc.
67-
discrimination in the gob format; there are only signed and unsigned integers. As
66+
arbitrary precision unsigned integers. There is no int8, int16 etc.
67+
discrimination in the gob format; there are only signed and unsigned integers. As
6868
described below, the transmitter sends the value in a variable-length encoding;
6969
the receiver accepts the value and stores it in the destination variable.
7070
Floating-point numbers are always sent using IEEE-754 64-bit precision (see
7171
below).
7272
7373
Signed integers may be received into any signed integer variable: int, int16, etc.;
7474
unsigned integers may be received into any unsigned integer variable; and floating
75-
point values may be received into any floating point variable. However,
75+
point values may be received into any floating point variable. However,
7676
the destination variable must be able to represent the value or the decode
7777
operation will fail.
7878
@@ -106,17 +106,17 @@ Encoding Details
106106
This section documents the encoding, details that are not important for most
107107
users. Details are presented bottom-up.
108108
109-
An unsigned integer is sent one of two ways. If it is less than 128, it is sent
110-
as a byte with that value. Otherwise it is sent as a minimal-length big-endian
109+
An unsigned integer is sent one of two ways. If it is less than 128, it is sent
110+
as a byte with that value. Otherwise it is sent as a minimal-length big-endian
111111
(high byte first) byte stream holding the value, preceded by one byte holding the
112-
byte count, negated. Thus 0 is transmitted as (00), 7 is transmitted as (07) and
112+
byte count, negated. Thus 0 is transmitted as (00), 7 is transmitted as (07) and
113113
256 is transmitted as (FE 01 00).
114114
115115
A boolean is encoded within an unsigned integer: 0 for false, 1 for true.
116116
117-
A signed integer, i, is encoded within an unsigned integer, u. Within u, bits 1
117+
A signed integer, i, is encoded within an unsigned integer, u. Within u, bits 1
118118
upward contain the value; bit 0 says whether they should be complemented upon
119-
receipt. The encode algorithm looks like this:
119+
receipt. The encode algorithm looks like this:
120120
121121
var u uint
122122
if i < 0 {
@@ -127,14 +127,14 @@ receipt. The encode algorithm looks like this:
127127
encodeUnsigned(u)
128128
129129
The low bit is therefore analogous to a sign bit, but making it the complement bit
130-
instead guarantees that the largest negative integer is not a special case. For
130+
instead guarantees that the largest negative integer is not a special case. For
131131
example, -129=^128=(^256>>1) encodes as (FE 01 01).
132132
133133
Floating-point numbers are always sent as a representation of a float64 value.
134-
That value is converted to a uint64 using math.Float64bits. The uint64 is then
135-
byte-reversed and sent as a regular unsigned integer. The byte-reversal means the
136-
exponent and high-precision part of the mantissa go first. Since the low bits are
137-
often zero, this can save encoding bytes. For instance, 17.0 is encoded in only
134+
That value is converted to a uint64 using math.Float64bits. The uint64 is then
135+
byte-reversed and sent as a regular unsigned integer. The byte-reversal means the
136+
exponent and high-precision part of the mantissa go first. Since the low bits are
137+
often zero, this can save encoding bytes. For instance, 17.0 is encoded in only
138138
three bytes (FE 31 40).
139139
140140
Strings and slices of bytes are sent as an unsigned count followed by that many
@@ -151,27 +151,27 @@ is nil and not at the top level.
151151
In slices and arrays, as well as maps, all elements, even zero-valued elements,
152152
are transmitted, even if all the elements are zero.
153153
154-
Structs are sent as a sequence of (field number, field value) pairs. The field
155-
value is sent using the standard gob encoding for its type, recursively. If a
154+
Structs are sent as a sequence of (field number, field value) pairs. The field
155+
value is sent using the standard gob encoding for its type, recursively. If a
156156
field has the zero value for its type (except for arrays; see above), it is omitted
157-
from the transmission. The field number is defined by the type of the encoded
157+
from the transmission. The field number is defined by the type of the encoded
158158
struct: the first field of the encoded type is field 0, the second is field 1,
159-
etc. When encoding a value, the field numbers are delta encoded for efficiency
159+
etc. When encoding a value, the field numbers are delta encoded for efficiency
160160
and the fields are always sent in order of increasing field number; the deltas are
161-
therefore unsigned. The initialization for the delta encoding sets the field
161+
therefore unsigned. The initialization for the delta encoding sets the field
162162
number to -1, so an unsigned integer field 0 with value 7 is transmitted as unsigned
163-
delta = 1, unsigned value = 7 or (01 07). Finally, after all the fields have been
164-
sent a terminating mark denotes the end of the struct. That mark is a delta=0
163+
delta = 1, unsigned value = 7 or (01 07). Finally, after all the fields have been
164+
sent a terminating mark denotes the end of the struct. That mark is a delta=0
165165
value, which has representation (00).
166166
167167
Interface types are not checked for compatibility; all interface types are
168168
treated, for transmission, as members of a single "interface" type, analogous to
169-
int or []byte - in effect they're all treated as interface{}. Interface values
169+
int or []byte - in effect they're all treated as interface{}. Interface values
170170
are transmitted as a string identifying the concrete type being sent (a name
171171
that must be pre-defined by calling Register), followed by a byte count of the
172172
length of the following data (so the value can be skipped if it cannot be
173173
stored), followed by the usual encoding of concrete (dynamic) value stored in
174-
the interface value. (A nil interface value is identified by the empty string
174+
the interface value. (A nil interface value is identified by the empty string
175175
and transmits no value.) Upon receipt, the decoder verifies that the unpacked
176176
concrete item satisfies the interface of the receiving variable.
177177
@@ -181,9 +181,9 @@ The only visible effect of this is to encode a zero byte after the value, just a
181181
after the last field of an encoded struct, so that the decode algorithm knows when
182182
the top-level value is complete.
183183
184-
The representation of types is described below. When a type is defined on a given
184+
The representation of types is described below. When a type is defined on a given
185185
connection between an Encoder and Decoder, it is assigned a signed integer type
186-
id. When Encoder.Encode(v) is called, it makes sure there is an id assigned for
186+
id. When Encoder.Encode(v) is called, it makes sure there is an id assigned for
187187
the type of v and all its elements and then it sends the pair (typeid, encoded-v)
188188
where typeid is the type id of the encoded type of v and encoded-v is the gob
189189
encoding of the value v.
@@ -229,7 +229,7 @@ If there are nested type ids, the types for all inner type ids must be defined
229229
before the top-level type id is used to describe an encoded-v.
230230
231231
For simplicity in setup, the connection is defined to understand these types a
232-
priori, as well as the basic gob types int, uint, etc. Their ids are:
232+
priori, as well as the basic gob types int, uint, etc. Their ids are:
233233
234234
bool 1
235235
int 2
@@ -250,7 +250,7 @@ priori, as well as the basic gob types int, uint, etc. Their ids are:
250250
MapType 23
251251
252252
Finally, each message created by a call to Encode is preceded by an encoded
253-
unsigned integer count of the number of bytes remaining in the message. After
253+
unsigned integer count of the number of bytes remaining in the message. After
254254
the initial type name, interface values are wrapped the same way; in effect, the
255255
interface value acts like a recursive invocation of Encode.
256256
@@ -262,7 +262,7 @@ where * signifies zero or more repetitions and the type id of a value must
262262
be predefined or be defined before the value in the stream.
263263
264264
Compatibility: Any future changes to the package will endeavor to maintain
265-
compatibility with streams encoded using previous versions. That is, any released
265+
compatibility with streams encoded using previous versions. That is, any released
266266
version of this package should be able to decode data written with any previously
267267
released version, subject to issues such as security fixes. See the Go compatibility
268268
document for background: https://golang.org/doc/go1compat
@@ -321,7 +321,7 @@ StructValue:
321321
*/
322322

323323
/*
324-
For implementers and the curious, here is an encoded example. Given
324+
For implementers and the curious, here is an encoded example. Given
325325
type Point struct {X, Y int}
326326
and the value
327327
p := Point{22, 33}
@@ -332,14 +332,14 @@ the bytes transmitted that encode p will be:
332332
They are determined as follows.
333333
334334
Since this is the first transmission of type Point, the type descriptor
335-
for Point itself must be sent before the value. This is the first type
335+
for Point itself must be sent before the value. This is the first type
336336
we've sent on this Encoder, so it has type id 65 (0 through 64 are
337337
reserved).
338338
339339
1f // This item (a type descriptor) is 31 bytes long.
340340
ff 81 // The negative of the id for the type we're defining, -65.
341341
// This is one byte (indicated by FF = -1) followed by
342-
// ^-65<<1 | 1. The low 1 bit signals to complement the
342+
// ^-65<<1 | 1. The low 1 bit signals to complement the
343343
// rest upon receipt.
344344
345345
// Now we send a type descriptor, which is itself a struct (wireType).
@@ -376,7 +376,7 @@ reserved).
376376
00 // end of wireType.structType structure
377377
00 // end of wireType structure
378378
379-
Now we can send the Point value. Again the field number resets to -1:
379+
Now we can send the Point value. Again the field number resets to -1:
380380
381381
07 // this value is 7 bytes long
382382
ff 82 // the type number, 65 (1 byte (-FF) followed by 65<<1)
@@ -393,7 +393,7 @@ output will be just:
393393
07 ff 82 01 2c 01 42 00
394394
395395
A single non-struct value at top level is transmitted like a field with
396-
delta tag 0. For instance, a signed integer with value 3 presented as
396+
delta tag 0. For instance, a signed integer with value 3 presented as
397397
the argument to Encode will emit:
398398
399399
03 04 00 06

0 commit comments

Comments
 (0)