@@ -137,8 +137,10 @@ usecols : array-like or callable, default ``None``
137
137
138
138
Using this parameter results in much faster parsing time and lower memory usage.
139
139
as_recarray : boolean, default ``False ``
140
- DEPRECATED: this argument will be removed in a future version. Please call
141
- ``pd.read_csv(...).to_records() `` instead.
140
+
141
+ .. deprecated :: 0.18.2
142
+
143
+ Please call ``pd.read_csv(...).to_records() `` instead.
142
144
143
145
Return a NumPy recarray instead of a DataFrame after parsing the data. If
144
146
set to ``True ``, this option takes precedence over the ``squeeze `` parameter.
@@ -191,7 +193,11 @@ skiprows : list-like or integer, default ``None``
191
193
skipfooter : int, default ``0 ``
192
194
Number of lines at bottom of file to skip (unsupported with engine='c').
193
195
skip_footer : int, default ``0 ``
194
- DEPRECATED: use the ``skipfooter `` parameter instead, as they are identical
196
+
197
+ .. deprecated :: 0.19.0
198
+
199
+ Use the ``skipfooter `` parameter instead, as they are identical
200
+
195
201
nrows : int, default ``None ``
196
202
Number of rows of file to read. Useful for reading pieces of large files.
197
203
low_memory : boolean, default ``True ``
@@ -202,16 +208,25 @@ low_memory : boolean, default ``True``
202
208
use the ``chunksize `` or ``iterator `` parameter to return the data in chunks.
203
209
(Only valid with C parser)
204
210
buffer_lines : int, default None
205
- DEPRECATED: this argument will be removed in a future version because its
206
- value is not respected by the parser
211
+
212
+ .. deprecated :: 0.19.0
213
+
214
+ Argument removed because its value is not respected by the parser
215
+
207
216
compact_ints : boolean, default False
208
- DEPRECATED: this argument will be removed in a future version
217
+
218
+ .. deprecated :: 0.19.0
219
+
220
+ Argument moved to ``pd.to_numeric ``
209
221
210
222
If ``compact_ints `` is ``True ``, then for any column that is of integer dtype, the
211
223
parser will attempt to cast it as the smallest integer ``dtype `` possible, either
212
224
signed or unsigned depending on the specification from the ``use_unsigned `` parameter.
213
225
use_unsigned : boolean, default False
214
- DEPRECATED: this argument will be removed in a future version
226
+
227
+ .. deprecated :: 0.18.2
228
+
229
+ Argument moved to ``pd.to_numeric ``
215
230
216
231
If integer columns are being compacted (i.e. ``compact_ints=True ``), specify whether
217
232
the column should be compacted to the smallest signed or unsigned integer dtype.
@@ -225,9 +240,9 @@ NA and Missing Data Handling
225
240
226
241
na_values : scalar, str, list-like, or dict, default ``None ``
227
242
Additional strings to recognize as NA/NaN. If dict passed, specific per-column
228
- NA values. By default the following values are interpreted as NaN:
229
- `` '-1.#IND', '1.#QNAN', '1.#IND', '-1.#QNAN', '#N/A N/A', '#N/A', 'N/A', 'n/a', 'NA',
230
- '#NA', 'NULL', 'null', 'NaN', '-NaN', 'nan', '-nan', '' ``.
243
+ NA values. See :ref: ` na values const < io.navaluesconst >` below
244
+ for a list of the values interpreted as NaN by default.
245
+
231
246
keep_default_na : boolean, default ``True ``
232
247
If na_values are specified and keep_default_na is ``False `` the default NaN
233
248
values are overridden, otherwise they're appended to.
@@ -712,6 +727,16 @@ index column inference and discard the last column, pass ``index_col=False``:
712
727
pd.read_csv(StringIO(data))
713
728
pd.read_csv(StringIO(data), index_col = False )
714
729
730
+ If a subset of data is being parsed using the ``usecols `` option, the
731
+ ``index_col `` specification is based on that subset, not the original data.
732
+
733
+ .. ipython :: python
734
+
735
+ data = ' a,b,c\n 4,apple,bat,\n 8,orange,cow,'
736
+ print (data)
737
+ pd.read_csv(StringIO(data), usecols = [' b' , ' c' ])
738
+ pd.read_csv(StringIO(data), usecols = [' b' , ' c' ], index_col = 0 )
739
+
715
740
.. _io.parse_dates :
716
741
717
742
Date Handling
@@ -1020,10 +1045,11 @@ the corresponding equivalent values will also imply a missing value (in this cas
1020
1045
``[5.0,5] `` are recognized as ``NaN ``.
1021
1046
1022
1047
To completely override the default values that are recognized as missing, specify ``keep_default_na=False ``.
1023
- The default ``NaN `` recognized values are ``['-1.#IND', '1.#QNAN', '1.#IND', '-1.#QNAN', '#N/A','N/A', 'NA',
1024
- '#NA', 'NULL', 'NaN', '-NaN', 'nan', '-nan'] ``. Although a 0-length string
1025
- ``'' `` is not included in the default ``NaN `` values list, it is still treated
1026
- as a missing value.
1048
+
1049
+ .. _io.navaluesconst :
1050
+
1051
+ The default ``NaN `` recognized values are ``['-1.#IND', '1.#QNAN', '1.#IND', '-1.#QNAN', '#N/A N/A', '#N/A', 'N/A',
1052
+ 'n/a', 'NA', '#NA', 'NULL', 'null', 'NaN', '-NaN', 'nan', '-nan', ''] ``.
1027
1053
1028
1054
.. code-block :: python
1029
1055
@@ -3396,7 +3422,7 @@ Fixed Format
3396
3422
This was prior to 0.13.0 the ``Storer `` format.
3397
3423
3398
3424
The examples above show storing using ``put ``, which write the HDF5 to ``PyTables `` in a fixed array format, called
3399
- the ``fixed `` format. These types of stores are are **not ** appendable once written (though you can simply
3425
+ the ``fixed `` format. These types of stores are **not ** appendable once written (though you can simply
3400
3426
remove them and rewrite). Nor are they **queryable **; they must be
3401
3427
retrieved in their entirety. They also do not support dataframes with non-unique column names.
3402
3428
The ``fixed `` format stores offer very fast writing and slightly faster reading than ``table `` stores.
@@ -4056,26 +4082,64 @@ Compression
4056
4082
+++++++++++
4057
4083
4058
4084
``PyTables `` allows the stored data to be compressed. This applies to
4059
- all kinds of stores, not just tables.
4085
+ all kinds of stores, not just tables. Two parameters are used to
4086
+ control compression: ``complevel `` and ``complib ``.
4087
+
4088
+ ``complevel `` specifies if and how hard data is to be compressed.
4089
+ ``complevel=0 `` and ``complevel=None `` disables
4090
+ compression and ``0<complevel<10 `` enables compression.
4091
+
4092
+ ``complib `` specifies which compression library to use. If nothing is
4093
+ specified the default library ``zlib `` is used. A
4094
+ compression library usually optimizes for either good
4095
+ compression rates or speed and the results will depend on
4096
+ the type of data. Which type of
4097
+ compression to choose depends on your specific needs and
4098
+ data. The list of supported compression libraries:
4099
+
4100
+ - `zlib <http://zlib.net/ >`_: The default compression library. A classic in terms of compression, achieves good compression rates but is somewhat slow.
4101
+ - `lzo <http://www.oberhumer.com/opensource/lzo/ >`_: Fast compression and decompression.
4102
+ - `bzip2 <http://bzip.org/ >`_: Good compression rates.
4103
+ - `blosc <http://www.blosc.org/ >`_: Fast compression and decompression.
4104
+
4105
+ .. versionadded :: 0.20.2
4106
+
4107
+ Support for alternative blosc compressors:
4108
+
4109
+ - `blosc:blosclz <http://www.blosc.org/ >`_ This is the
4110
+ default compressor for ``blosc ``
4111
+ - `blosc:lz4
4112
+ <https://fastcompression.blogspot.dk/p/lz4.html> `_:
4113
+ A compact, very popular and fast compressor.
4114
+ - `blosc:lz4hc
4115
+ <https://fastcompression.blogspot.dk/p/lz4.html> `_:
4116
+ A tweaked version of LZ4, produces better
4117
+ compression ratios at the expense of speed.
4118
+ - `blosc:snappy <https://google.github.io/snappy/ >`_:
4119
+ A popular compressor used in many places.
4120
+ - `blosc:zlib <http://zlib.net/ >`_: A classic;
4121
+ somewhat slower than the previous ones, but
4122
+ achieving better compression ratios.
4123
+ - `blosc:zstd <https://facebook.github.io/zstd/ >`_: An
4124
+ extremely well balanced codec; it provides the best
4125
+ compression ratios among the others above, and at
4126
+ reasonably fast speed.
4127
+
4128
+ If ``complib `` is defined as something other than the
4129
+ listed libraries a ``ValueError `` exception is issued.
4060
4130
4061
- - Pass ``complevel=int `` for a compression level (1-9, with 0 being no
4062
- compression, and the default)
4063
- - Pass ``complib=lib `` where lib is any of ``zlib, bzip2, lzo, blosc `` for
4064
- whichever compression library you prefer.
4131
+ .. note ::
4065
4132
4066
- ``HDFStore `` will use the file based compression scheme if no overriding
4067
- ``complib `` or ``complevel `` options are provided. ``blosc `` offers very
4068
- fast compression, and is my most used. Note that ``lzo `` and ``bzip2 ``
4069
- may not be installed (by Python) by default.
4133
+ If the library specified with the ``complib `` option is missing on your platform,
4134
+ compression defaults to ``zlib `` without further ado.
4070
4135
4071
- Compression for all objects within the file
4136
+ Enable compression for all objects within the file:
4072
4137
4073
4138
.. code-block :: python
4074
4139
4075
- store_compressed = pd.HDFStore(' store_compressed.h5' , complevel = 9 , complib = ' blosc' )
4140
+ store_compressed = pd.HDFStore(' store_compressed.h5' , complevel = 9 , complib = ' blosc:blosclz ' )
4076
4141
4077
- Or on-the-fly compression (this only applies to tables). You can turn
4078
- off file compression for a specific table by passing ``complevel=0 ``
4142
+ Or on-the-fly compression (this only applies to tables) in stores where compression is not enabled:
4079
4143
4080
4144
.. code-block :: python
4081
4145
@@ -4410,34 +4474,6 @@ Performance
4410
4474
`Here <http://stackoverflow.com/questions/14355151/how-to-make-pandas-hdfstore-put-operation-faster/14370190#14370190 >`__
4411
4475
for more information and some solutions.
4412
4476
4413
- Experimental
4414
- ''''''''''''
4415
-
4416
- HDFStore supports ``Panel4D `` storage.
4417
-
4418
- .. ipython :: python
4419
- :okwarning:
4420
-
4421
- wp = pd.Panel(randn(2 , 5 , 4 ), items = [' Item1' , ' Item2' ],
4422
- major_axis = pd.date_range(' 1/1/2000' , periods = 5 ),
4423
- minor_axis = [' A' , ' B' , ' C' , ' D' ])
4424
- p4d = pd.Panel4D({ ' l1' : wp })
4425
- p4d
4426
- store.append(' p4d' , p4d)
4427
- store
4428
-
4429
- These, by default, index the three axes ``items, major_axis,
4430
- minor_axis ``. On an ``AppendableTable `` it is possible to setup with the
4431
- first append a different indexing scheme, depending on how you want to
4432
- store your data. Pass the ``axes `` keyword with a list of dimensions
4433
- (currently must by exactly 1 less than the total dimensions of the
4434
- object). This cannot be changed after table creation.
4435
-
4436
- .. ipython :: python
4437
- :okwarning:
4438
-
4439
- store.append(' p4d2' , p4d, axes = [' labels' , ' major_axis' , ' minor_axis' ])
4440
- store.select(' p4d2' , where = ' labels=l1 and items=Item1 and minor_axis=A' )
4441
4477
4442
4478
.. ipython :: python
4443
4479
:suppress:
0 commit comments