diff --git a/doc/source/categorical.rst b/doc/source/categorical.rst index e971f1f28903f..f0e01ddc3fc2d 100644 --- a/doc/source/categorical.rst +++ b/doc/source/categorical.rst @@ -653,7 +653,7 @@ The same applies to ``df.append(df_different)``. Unioning ~~~~~~~~ -.. versionadded:: 0.18.2 +.. versionadded:: 0.19.0 If you want to combine categoricals that do not necessarily have the same categories, the `union_categorical` function will diff --git a/doc/source/merging.rst b/doc/source/merging.rst index b69d0d8ba3015..f14e5741c6e2e 100644 --- a/doc/source/merging.rst +++ b/doc/source/merging.rst @@ -1133,7 +1133,7 @@ fill/interpolate missing data: Merging AsOf ~~~~~~~~~~~~ -.. versionadded:: 0.18.2 +.. versionadded:: 0.19.0 A :func:`merge_asof` is similar to an ordered left-join except that we match on nearest key rather than equal keys. For each row in the ``left`` DataFrame, we select the last row in the ``right`` DataFrame whose ``on`` key is less than the left's key. Both DataFrames must be sorted by the key. diff --git a/doc/source/text.rst b/doc/source/text.rst index 3822c713d7f85..3a4a57ff4da95 100644 --- a/doc/source/text.rst +++ b/doc/source/text.rst @@ -316,7 +316,7 @@ then ``extractall(pat).xs(0, level='match')`` gives the same result as ``Index`` also supports ``.str.extractall``. It returns a ``DataFrame`` which has the same result as a ``Series.str.extractall`` with a default index (starts from 0). -.. versionadded:: 0.18.2 +.. versionadded:: 0.19.0 .. ipython:: python diff --git a/doc/source/whatsnew.rst b/doc/source/whatsnew.rst index 685f1d2086c69..77dc249aeb788 100644 --- a/doc/source/whatsnew.rst +++ b/doc/source/whatsnew.rst @@ -18,7 +18,7 @@ What's New These are new features and improvements of note in each release. -.. include:: whatsnew/v0.18.2.txt +.. include:: whatsnew/v0.19.0.txt .. include:: whatsnew/v0.18.1.txt diff --git a/doc/source/whatsnew/v0.18.2.txt b/doc/source/whatsnew/v0.18.2.txt deleted file mode 100644 index 64644bd9a7a26..0000000000000 --- a/doc/source/whatsnew/v0.18.2.txt +++ /dev/null @@ -1,532 +0,0 @@ -.. _whatsnew_0182: - -v0.18.2 (July ??, 2016) ------------------------ - -This is a minor bug-fix release from 0.18.1 and includes a large number of -bug fixes along with several new features, enhancements, and performance improvements. -We recommend that all users upgrade to this version. - -Highlights include: - -- :func:`merge_asof` for asof-style time-series joining, see :ref:`here ` - -.. contents:: What's new in v0.18.2 - :local: - :backlinks: none - -.. _whatsnew_0182.new_features: - -New features -~~~~~~~~~~~~ - -.. _whatsnew_0182.enhancements.asof_merge: - -:func:`merge_asof` for asof-style time-series joining -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - -A long-time requested feature has been added through the :func:`merge_asof` function, to -support asof style joining of time-series. (:issue:`1870`). Full documentation is -:ref:`here ` - -The :func:`merge_asof` performs an asof merge, which is similar to a left-join -except that we match on nearest key rather than equal keys. - -.. ipython:: python - - left = pd.DataFrame({'a': [1, 5, 10], - 'left_val': ['a', 'b', 'c']}) - right = pd.DataFrame({'a': [1, 2, 3, 6, 7], - 'right_val': [1, 2, 3, 6, 7]}) - - left - right - -We typically want to match exactly when possible, and use the most -recent value otherwise. - -.. ipython:: python - - pd.merge_asof(left, right, on='a') - -We can also match rows ONLY with prior data, and not an exact match. - -.. ipython:: python - - pd.merge_asof(left, right, on='a', allow_exact_matches=False) - - -In a typical time-series example, we have ``trades`` and ``quotes`` and we want to ``asof-join`` them. -This also illustrates using the ``by`` parameter to group data before merging. - -.. ipython:: python - - trades = pd.DataFrame({ - 'time': pd.to_datetime(['20160525 13:30:00.023', - '20160525 13:30:00.038', - '20160525 13:30:00.048', - '20160525 13:30:00.048', - '20160525 13:30:00.048']), - 'ticker': ['MSFT', 'MSFT', - 'GOOG', 'GOOG', 'AAPL'], - 'price': [51.95, 51.95, - 720.77, 720.92, 98.00], - 'quantity': [75, 155, - 100, 100, 100]}, - columns=['time', 'ticker', 'price', 'quantity']) - - quotes = pd.DataFrame({ - 'time': pd.to_datetime(['20160525 13:30:00.023', - '20160525 13:30:00.023', - '20160525 13:30:00.030', - '20160525 13:30:00.041', - '20160525 13:30:00.048', - '20160525 13:30:00.049', - '20160525 13:30:00.072', - '20160525 13:30:00.075']), - 'ticker': ['GOOG', 'MSFT', 'MSFT', - 'MSFT', 'GOOG', 'AAPL', 'GOOG', - 'MSFT'], - 'bid': [720.50, 51.95, 51.97, 51.99, - 720.50, 97.99, 720.50, 52.01], - 'ask': [720.93, 51.96, 51.98, 52.00, - 720.93, 98.01, 720.88, 52.03]}, - columns=['time', 'ticker', 'bid', 'ask']) - -.. ipython:: python - - trades - quotes - -An asof merge joins on the ``on``, typically a datetimelike field, which is ordered, and -in this case we are using a grouper in the ``by`` field. This is like a left-outer join, except -that forward filling happens automatically taking the most recent non-NaN value. - -.. ipython:: python - - pd.merge_asof(trades, quotes, - on='time', - by='ticker') - -This returns a merged DataFrame with the entries in the same order as the original left -passed DataFrame (``trades`` in this case), with the fields of the ``quotes`` merged. - -.. _whatsnew_0182.enhancements.read_csv_dupe_col_names_support: - -:func:`read_csv` has improved support for duplicate column names -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - -:ref:`Duplicate column names ` are now supported in :func:`read_csv` whether -they are in the file or passed in as the ``names`` parameter (:issue:`7160`, :issue:`9424`) - -.. ipython :: python - - data = '0,1,2\n3,4,5' - names = ['a', 'b', 'a'] - -Previous behaviour: - -.. code-block:: ipython - - In [2]: pd.read_csv(StringIO(data), names=names) - Out[2]: - a b a - 0 2 1 2 - 1 5 4 5 - -The first 'a' column contains the same data as the second 'a' column, when it should have -contained the array ``[0, 3]``. - -New behaviour: - -.. ipython :: python - - In [2]: pd.read_csv(StringIO(data), names=names) - -.. _whatsnew_0182.enhancements.semi_month_offsets: - -Semi-Month Offsets -^^^^^^^^^^^^^^^^^^ - -Pandas has gained new frequency offsets, ``SemiMonthEnd`` ('SM') and ``SemiMonthBegin`` ('SMS'). -These provide date offsets anchored (by default) to the 15th and end of month, and 15th and 1st of month respectively. -(:issue:`1543`) - -.. ipython:: python - - from pandas.tseries.offsets import SemiMonthEnd, SemiMonthBegin - -SemiMonthEnd: - -.. ipython:: python - - Timestamp('2016-01-01') + SemiMonthEnd() - - pd.date_range('2015-01-01', freq='SM', periods=4) - -SemiMonthBegin: - -.. ipython:: python - - Timestamp('2016-01-01') + SemiMonthBegin() - - pd.date_range('2015-01-01', freq='SMS', periods=4) - -Using the anchoring suffix, you can also specify the day of month to use instead of the 15th. - -.. ipython:: python - - pd.date_range('2015-01-01', freq='SMS-16', periods=4) - - pd.date_range('2015-01-01', freq='SM-14', periods=4) - -.. _whatsnew_0182.enhancements.other: - -Other enhancements -^^^^^^^^^^^^^^^^^^ - -- The ``.tz_localize()`` method of ``DatetimeIndex`` and ``Timestamp`` has gained the ``errors`` keyword, so you can potentially coerce nonexistent timestamps to ``NaT``. The default behaviour remains to raising a ``NonExistentTimeError`` (:issue:`13057`) - -- ``Index`` now supports ``.str.extractall()`` which returns a ``DataFrame``, see :ref:`documentation here ` (:issue:`10008`, :issue:`13156`) -- ``.to_hdf/read_hdf()`` now accept path objects (e.g. ``pathlib.Path``, ``py.path.local``) for the file path (:issue:`11773`) - - .. ipython:: python - - idx = pd.Index(["a1a2", "b1", "c1"]) - idx.str.extractall("[ab](?P\d)") - -- ``Timestamp`` s can now accept positional and keyword parameters like :func:`datetime.datetime` (:issue:`10758`, :issue:`11630`) - - .. ipython:: python - - pd.Timestamp(2012, 1, 1) - - pd.Timestamp(year=2012, month=1, day=1, hour=8, minute=30) - -- The ``pd.read_csv()`` with ``engine='python'`` has gained support for the ``decimal`` option (:issue:`12933`) -- The ``pd.read_csv()`` with ``engine='python'`` has gained support for the ``na_filter`` option (:issue:`13321`) -- The ``pd.read_csv()`` with ``engine='python'`` has gained support for the ``memory_map`` option (:issue:`13381`) - -- ``Index.astype()`` now accepts an optional boolean argument ``copy``, which allows optional copying if the requirements on dtype are satisfied (:issue:`13209`) -- ``Index`` now supports the ``.where()`` function for same shape indexing (:issue:`13170`) - - .. ipython:: python - - idx = pd.Index(['a', 'b', 'c']) - idx.where([True, False, True]) - -- ``Categorical.astype()`` now accepts an optional boolean argument ``copy``, effective when dtype is categorical (:issue:`13209`) -- ``DataFrame`` has gained the ``.asof()`` method to return the last non-NaN values according to the selected subset (:issue:`13358`) -- Consistent with the Python API, ``pd.read_csv()`` will now interpret ``+inf`` as positive infinity (:issue:`13274`) -- The ``DataFrame`` constructor will now respect key ordering if a list of ``OrderedDict`` objects are passed in (:issue:`13304`) -- ``pd.read_html()`` has gained support for the ``decimal`` option (:issue:`12907`) -- A ``union_categorical`` function has been added for combining categoricals, see :ref:`Unioning Categoricals` (:issue:`13361`) -- ``eval``'s upcasting rules for ``float32`` types have been updated to be more consistent with NumPy's rules. New behavior will not upcast to ``float64`` if you multiply a pandas ``float32`` object by a scalar float64. (:issue:`12388`) -- ``Series`` has gained the properties ``.is_monotonic``, ``.is_monotonic_increasing``, ``.is_monotonic_decreasing``, similar to ``Index`` (:issue:`13336`) - -.. _whatsnew_0182.api: - -API changes -~~~~~~~~~~~ - - -- Non-convertible dates in an excel date column will be returned without conversion and the column will be ``object`` dtype, rather than raising an exception (:issue:`10001`) -- An ``UnsupportedFunctionCall`` error is now raised if NumPy ufuncs like ``np.mean`` are called on groupby or resample objects (:issue:`12811`) -- Calls to ``.sample()`` will respect the random seed set via ``numpy.random.seed(n)`` (:issue:`13161`) -- ``Styler.apply`` is now more strict about the outputs your function must return. For ``axis=0`` or ``axis=1``, the output shape must be identical. For ``axis=None``, the output must be a DataFrame with identical columns and index labels. (:issue:`13222`) - -.. _whatsnew_0182.api.tolist: - -``Series.tolist()`` will now return Python types -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - -``Series.tolist()`` will now return Python types in the output, mimicking NumPy ``.tolist()`` behaviour (:issue:`10904`) - - -.. ipython:: python - - s = pd.Series([1,2,3]) - type(s.tolist()[0]) - -Previous Behavior: - -.. code-block:: ipython - - In [7]: type(s.tolist()[0]) - Out[7]: - - -New Behavior: - -.. ipython:: python - - type(s.tolist()[0]) - -.. _whatsnew_0182.api.promote: - -``Series`` type promotion on assignment -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - -A ``Series`` will now correctly promote its dtype for assignment with incompat values to the current dtype (:issue:`13234`) - - -.. ipython:: python - - s = pd.Series() - -Previous Behavior: - -.. code-block:: ipython - - In [2]: s["a"] = pd.Timestamp("2016-01-01") - - In [3]: s["b"] = 3.0 - TypeError: invalid type promotion - -New Behavior: - -.. ipython:: python - - s["a"] = pd.Timestamp("2016-01-01") - s["b"] = 3.0 - s - s.dtype - -.. _whatsnew_0182.api.to_datetime_coerce: - -``.to_datetime()`` when coercing -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - -A bug is fixed in ``.to_datetime()`` when passing integers or floats, and no ``unit`` and ``errors='coerce'`` (:issue:`13180`). -Previously if ``.to_datetime()`` encountered mixed integers/floats and strings, but no datetimes with ``errors='coerce'`` it would convert all to ``NaT``. - -Previous Behavior: - -.. code-block:: ipython - - In [2]: pd.to_datetime([1, 'foo'], errors='coerce') - Out[2]: DatetimeIndex(['NaT', 'NaT'], dtype='datetime64[ns]', freq=None) - -This will now convert integers/floats with the default unit of ``ns``. - -.. ipython:: python - - pd.to_datetime([1, 'foo'], errors='coerce') - -.. _whatsnew_0182.api.merging: - -Merging changes -^^^^^^^^^^^^^^^ - -Merging will now preserve the dtype of the join keys (:issue:`8596`) - -.. ipython:: python - - df1 = pd.DataFrame({'key': [1], 'v1': [10]}) - df1 - df2 = pd.DataFrame({'key': [1, 2], 'v1': [20, 30]}) - df2 - -Previous Behavior: - -.. code-block:: ipython - - In [5]: pd.merge(df1, df2, how='outer') - Out[5]: - key v1 - 0 1.0 10.0 - 1 1.0 20.0 - 2 2.0 30.0 - - In [6]: pd.merge(df1, df2, how='outer').dtypes - Out[6]: - key float64 - v1 float64 - dtype: object - -New Behavior: - -We are able to preserve the join keys - -.. ipython:: python - - pd.merge(df1, df2, how='outer') - pd.merge(df1, df2, how='outer').dtypes - -Of course if you have missing values that are introduced, then the -resulting dtype will be upcast (unchanged from previous). - -.. ipython:: python - - pd.merge(df1, df2, how='outer', on='key') - pd.merge(df1, df2, how='outer', on='key').dtypes - -.. _whatsnew_0182.describe: - -``.describe()`` changes -^^^^^^^^^^^^^^^^^^^^^^^ - -Percentile identifiers in the index of a ``.describe()`` output will now be rounded to the least precision that keeps them distinct (:issue:`13104`) - -.. ipython:: python - - s = pd.Series([0, 1, 2, 3, 4]) - df = pd.DataFrame([0, 1, 2, 3, 4]) - -Previous Behavior: - -The percentiles were rounded to at most one decimal place, which could raise ``ValueError`` for a data frame if the percentiles were duplicated. - -.. code-block:: ipython - - In [3]: s.describe(percentiles=[0.0001, 0.0005, 0.001, 0.999, 0.9995, 0.9999]) - Out[3]: - count 5.000000 - mean 2.000000 - std 1.581139 - min 0.000000 - 0.0% 0.000400 - 0.1% 0.002000 - 0.1% 0.004000 - 50% 2.000000 - 99.9% 3.996000 - 100.0% 3.998000 - 100.0% 3.999600 - max 4.000000 - dtype: float64 - - In [4]: df.describe(percentiles=[0.0001, 0.0005, 0.001, 0.999, 0.9995, 0.9999]) - Out[4]: - ... - ValueError: cannot reindex from a duplicate axis - -New Behavior: - -.. ipython:: python - - s.describe(percentiles=[0.0001, 0.0005, 0.001, 0.999, 0.9995, 0.9999]) - df.describe(percentiles=[0.0001, 0.0005, 0.001, 0.999, 0.9995, 0.9999]) - -Furthermore: - -- Passing duplicated ``percentiles`` will now raise a ``ValueError``. -- Bug in ``.describe()`` on a DataFrame with a mixed-dtype column index, which would previously raise a ``TypeError`` (:issue:`13288`) - -.. _whatsnew_0182.api.other: - -Other API changes -^^^^^^^^^^^^^^^^^ - -- ``Float64Index.astype(int)`` will now raise ``ValueError`` if ``Float64Index`` contains ``NaN`` values (:issue:`13149`) -- ``TimedeltaIndex.astype(int)`` and ``DatetimeIndex.astype(int)`` will now return ``Int64Index`` instead of ``np.array`` (:issue:`13209`) -- ``.filter()`` enforces mutual exclusion of the keyword arguments. (:issue:`12399`) -- ``PeridIndex`` can now accept ``list`` and ``array`` which contains ``pd.NaT`` (:issue:`13430`) -- ``__setitem__`` will no longer apply a callable rhs as a function instead of storing it. Call ``where`` directly to get the previous behavior. (:issue:`13299`) - -.. _whatsnew_0182.deprecations: - -Deprecations -^^^^^^^^^^^^ - -- ``compact_ints`` and ``use_unsigned`` have been deprecated in ``pd.read_csv()`` and will be removed in a future version (:issue:`13320`) -- ``buffer_lines`` has been deprecated in ``pd.read_csv()`` and will be removed in a future version (:issue:`13360`) -- ``as_recarray`` has been deprecated in ``pd.read_csv()`` and will be removed in a future version (:issue:`13373`) -- top-level ``pd.ordered_merge()`` has been renamed to ``pd.merge_ordered()`` and the original name will be removed in a future version (:issue:`13358`) - -.. _whatsnew_0182.performance: - -Performance Improvements -~~~~~~~~~~~~~~~~~~~~~~~~ - -- Improved performance of sparse ``IntIndex.intersect`` (:issue:`13082`) -- Improved performance of sparse arithmetic with ``BlockIndex`` when the number of blocks are large, though recommended to use ``IntIndex`` in such cases (:issue:`13082`) -- increased performance of ``DataFrame.quantile()`` as it now operates per-block (:issue:`11623`) - -- Improved performance of float64 hash table operations, fixing some very slow indexing and groupby operations in python 3 (:issue:`13166`, :issue:`13334`) -- Improved performance of ``DataFrameGroupBy.transform`` (:issue:`12737`) - - -.. _whatsnew_0182.bug_fixes: - -Bug Fixes -~~~~~~~~~ - -- Bug in ``io.json.json_normalize()``, where non-ascii keys raised an exception (:issue:`13213`) -- Bug in ``SparseSeries`` with ``MultiIndex`` ``[]`` indexing may raise ``IndexError`` (:issue:`13144`) -- Bug in ``SparseSeries`` with ``MultiIndex`` ``[]`` indexing result may have normal ``Index`` (:issue:`13144`) -- Bug in ``SparseDataFrame`` in which ``axis=None`` did not default to ``axis=0`` (:issue:`13048`) -- Bug in ``SparseSeries`` and ``SparseDataFrame`` creation with ``object`` dtype may raise ``TypeError`` (:issue:`11633`) -- Bug when passing a not-default-indexed ``Series`` as ``xerr`` or ``yerr`` in ``.plot()`` (:issue:`11858`) -- Bug in matplotlib ``AutoDataFormatter``; this restores the second scaled formatting and re-adds micro-second scaled formatting (:issue:`13131`) -- Bug in selection from a ``HDFStore`` with a fixed format and ``start`` and/or ``stop`` specified will now return the selected range (:issue:`8287`) - - -- Bug in ``.groupby(..).resample(..)`` when the same object is called multiple times (:issue:`13174`) -- Bug in ``.to_records()`` when index name is a unicode string (:issue:`13172`) - -- Bug in calling ``.memory_usage()`` on object which doesn't implement (:issue:`12924`) - -- Regression in ``Series.quantile`` with nans (also shows up in ``.median()`` and ``.describe()`` ); furthermore now names the ``Series`` with the quantile (:issue:`13098`, :issue:`13146`) - -- Bug in ``SeriesGroupBy.transform`` with datetime values and missing groups (:issue:`13191`) - -- Bug in ``Series.str.extractall()`` with ``str`` index raises ``ValueError`` (:issue:`13156`) -- Bug in ``Series.str.extractall()`` with single group and quantifier (:issue:`13382`) - - -- Bug in ``PeriodIndex`` and ``Period`` subtraction raises ``AttributeError`` (:issue:`13071`) -- Bug in ``PeriodIndex`` construction returning a ``float64`` index in some circumstances (:issue:`13067`) -- Bug in ``.resample(..)`` with a ``PeriodIndex`` not changing its ``freq`` appropriately when empty (:issue:`13067`) -- Bug in ``.resample(..)`` with a ``PeriodIndex`` not retaining its type or name with an empty ``DataFrame`` appropriately when empty (:issue:`13212`) -- Bug in ``groupby(..).resample(..)`` where passing some keywords would raise an exception (:issue:`13235`) -- Bug in ``.tz_convert`` on a tz-aware ``DateTimeIndex`` that relied on index being sorted for correct results (:issue:`13306`) -- Bug in ``pd.read_hdf()`` where attempting to load an HDF file with a single dataset, that had one or more categorical columns, failed unless the key argument was set to the name of the dataset. (:issue:`13231`) -- Bug in ``.rolling()`` that allowed a negative integer window in contruction of the ``Rolling()`` object, but would later fail on aggregation (:issue:`13383`) - -- Bug in various index types, which did not propagate the name of passed index (:issue:`12309`) -- Bug in ``DatetimeIndex``, which did not honour the ``copy=True`` (:issue:`13205`) -- Bug in ``DatetimeIndex.is_normalized`` returns incorrectly for normalized date_range in case of local timezones (:issue:`13459`) - -- Bug in ``DataFrame.to_csv()`` in which float values were being quoted even though quotations were specified for non-numeric values only (:issue:`12922`, :issue:`13259`) -- Bug in ``MultiIndex`` slicing where extra elements were returned when level is non-unique (:issue:`12896`) -- Bug in ``.str.replace`` does not raise ``TypeError`` for invalid replacement (:issue:`13438`) - - -- Bug in ``pd.read_csv()`` with ``engine='python'`` in which ``NaN`` values weren't being detected after data was converted to numeric values (:issue:`13314`) -- Bug in ``pd.read_csv()`` in which the ``nrows`` argument was not properly validated for both engines (:issue:`10476`) -- Bug in ``pd.read_csv()`` with ``engine='python'`` in which infinities of mixed-case forms were not being interpreted properly (:issue:`13274`) -- Bug in ``pd.read_csv()`` with ``engine='python'`` in which trailing ``NaN`` values were not being parsed (:issue:`13320`) -- Bug in ``pd.read_csv()`` with ``engine='python'`` when reading from a tempfile.TemporaryFile on Windows with Python 3 (:issue:`13398`) -- Bug in ``pd.read_csv()`` that prevents ``usecols`` kwarg from accepting single-byte unicode strings (:issue:`13219`) -- Bug in ``pd.read_csv()`` that prevents ``usecols`` from being an empty set (:issue:`13402`) -- Bug in ``pd.read_csv()`` with ``engine=='c'`` in which null ``quotechar`` was not accepted even though ``quoting`` was specified as ``None`` (:issue:`13411`) -- Bug in ``pd.read_csv()`` with ``engine=='c'`` in which fields were not properly cast to float when quoting was specified as non-numeric (:issue:`13411`) -- Bug in ``pd.pivot_table()`` where ``margins_name`` is ignored when ``aggfunc`` is a list (:issue:`13354`) - - - -- Bug in ``Series`` arithmetic raises ``TypeError`` if it contains datetime-like as ``object`` dtype (:issue:`13043`) - - -- Bug in ``pd.to_datetime()`` when passing invalid datatypes (e.g. bool); will now respect the ``errors`` keyword (:issue:`13176`) -- Bug in ``pd.to_datetime()`` which overflowed on ``int8``, `int16`` dtypes (:issue:`13451`) -- Bug in extension dtype creation where the created types were not is/identical (:issue:`13285`) - -- Bug in ``NaT`` - ``Period`` raises ``AttributeError`` (:issue:`13071`) -- Bug in ``Period`` addition raises ``TypeError`` if ``Period`` is on right hand side (:issue:`13069`) -- Bug in ``Peirod`` and ``Series`` or ``Index`` comparison raises ``TypeError`` (:issue:`13200`) -- Bug in ``pd.set_eng_float_format()`` that would prevent NaN's from formatting (:issue:`11981`) -- Bug in ``.unstack`` with ``Categorical`` dtype resets ``.ordered`` to ``True`` (:issue:`13249`) - - -- Bug in ``Series`` comparison operators when dealing with zero dim NumPy arrays (:issue:`13006`) -- Bug in ``groupby`` where ``apply`` returns different result depending on whether first result is ``None`` or not (:issue:`12824`) -- Bug in ``groupby(..).nth()`` where the group key is included inconsistently if called after ``.head()/.tail()`` (:issue:`12839`) - -- Bug in ``pd.to_numeric`` when ``errors='coerce'`` and input contains non-hashable objects (:issue:`13324`) - - -- Bug in ``Categorical.remove_unused_categories()`` changes ``.codes`` dtype to platform int (:issue:`13261`) -- Bug in ``groupby`` with ``as_index=False`` returns all NaN's when grouping on multiple columns including a categorical one (:issue:`13204`) - -- Bug where ``pd.read_gbq()`` could throw ``ImportError: No module named discovery`` as a result of a naming conflict with another python package called apiclient (:issue:`13454`) diff --git a/doc/source/whatsnew/v0.19.0.txt b/doc/source/whatsnew/v0.19.0.txt index 42db0388ca5d9..70d54ea0d364d 100644 --- a/doc/source/whatsnew/v0.19.0.txt +++ b/doc/source/whatsnew/v0.19.0.txt @@ -1,7 +1,7 @@ .. _whatsnew_0190: -v0.19.0 (????, 2016) --------------------- +v0.19.0 (August ??, 2016) +------------------------- This is a major release from 0.18.2 and includes a small number of API changes, several new features, enhancements, and performance improvements along with a large number of bug fixes. We recommend that all @@ -9,75 +9,524 @@ users upgrade to this version. Highlights include: +- :func:`merge_asof` for asof-style time-series joining, see :ref:`here ` -Check the :ref:`API Changes ` and :ref:`deprecations ` before updating. - -.. contents:: What's new in v0.19.0 +.. contents:: What's new in v0.18.2 :local: :backlinks: none -.. _whatsnew_0190.enhancements: +.. _whatsnew_0190.new_features: New features ~~~~~~~~~~~~ +.. _whatsnew_0190.enhancements.asof_merge: + +:func:`merge_asof` for asof-style time-series joining +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +A long-time requested feature has been added through the :func:`merge_asof` function, to +support asof style joining of time-series. (:issue:`1870`). Full documentation is +:ref:`here ` + +The :func:`merge_asof` performs an asof merge, which is similar to a left-join +except that we match on nearest key rather than equal keys. + +.. ipython:: python + + left = pd.DataFrame({'a': [1, 5, 10], + 'left_val': ['a', 'b', 'c']}) + right = pd.DataFrame({'a': [1, 2, 3, 6, 7], + 'right_val': [1, 2, 3, 6, 7]}) + + left + right + +We typically want to match exactly when possible, and use the most +recent value otherwise. + +.. ipython:: python + + pd.merge_asof(left, right, on='a') + +We can also match rows ONLY with prior data, and not an exact match. + +.. ipython:: python + + pd.merge_asof(left, right, on='a', allow_exact_matches=False) + + +In a typical time-series example, we have ``trades`` and ``quotes`` and we want to ``asof-join`` them. +This also illustrates using the ``by`` parameter to group data before merging. + +.. ipython:: python + + trades = pd.DataFrame({ + 'time': pd.to_datetime(['20160525 13:30:00.023', + '20160525 13:30:00.038', + '20160525 13:30:00.048', + '20160525 13:30:00.048', + '20160525 13:30:00.048']), + 'ticker': ['MSFT', 'MSFT', + 'GOOG', 'GOOG', 'AAPL'], + 'price': [51.95, 51.95, + 720.77, 720.92, 98.00], + 'quantity': [75, 155, + 100, 100, 100]}, + columns=['time', 'ticker', 'price', 'quantity']) + + quotes = pd.DataFrame({ + 'time': pd.to_datetime(['20160525 13:30:00.023', + '20160525 13:30:00.023', + '20160525 13:30:00.030', + '20160525 13:30:00.041', + '20160525 13:30:00.048', + '20160525 13:30:00.049', + '20160525 13:30:00.072', + '20160525 13:30:00.075']), + 'ticker': ['GOOG', 'MSFT', 'MSFT', + 'MSFT', 'GOOG', 'AAPL', 'GOOG', + 'MSFT'], + 'bid': [720.50, 51.95, 51.97, 51.99, + 720.50, 97.99, 720.50, 52.01], + 'ask': [720.93, 51.96, 51.98, 52.00, + 720.93, 98.01, 720.88, 52.03]}, + columns=['time', 'ticker', 'bid', 'ask']) + +.. ipython:: python + + trades + quotes + +An asof merge joins on the ``on``, typically a datetimelike field, which is ordered, and +in this case we are using a grouper in the ``by`` field. This is like a left-outer join, except +that forward filling happens automatically taking the most recent non-NaN value. + +.. ipython:: python + + pd.merge_asof(trades, quotes, + on='time', + by='ticker') + +This returns a merged DataFrame with the entries in the same order as the original left +passed DataFrame (``trades`` in this case), with the fields of the ``quotes`` merged. + +.. _whatsnew_0190.enhancements.read_csv_dupe_col_names_support: + +:func:`read_csv` has improved support for duplicate column names +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +:ref:`Duplicate column names ` are now supported in :func:`read_csv` whether +they are in the file or passed in as the ``names`` parameter (:issue:`7160`, :issue:`9424`) + +.. ipython :: python + + data = '0,1,2\n3,4,5' + names = ['a', 'b', 'a'] + +Previous behaviour: + +.. code-block:: ipython + + In [2]: pd.read_csv(StringIO(data), names=names) + Out[2]: + a b a + 0 2 1 2 + 1 5 4 5 + +The first 'a' column contains the same data as the second 'a' column, when it should have +contained the array ``[0, 3]``. + +New behaviour: + +.. ipython :: python + + In [2]: pd.read_csv(StringIO(data), names=names) + +.. _whatsnew_0190.enhancements.semi_month_offsets: +Semi-Month Offsets +^^^^^^^^^^^^^^^^^^ + +Pandas has gained new frequency offsets, ``SemiMonthEnd`` ('SM') and ``SemiMonthBegin`` ('SMS'). +These provide date offsets anchored (by default) to the 15th and end of month, and 15th and 1st of month respectively. +(:issue:`1543`) + +.. ipython:: python + + from pandas.tseries.offsets import SemiMonthEnd, SemiMonthBegin + +SemiMonthEnd: + +.. ipython:: python + + Timestamp('2016-01-01') + SemiMonthEnd() + + pd.date_range('2015-01-01', freq='SM', periods=4) + +SemiMonthBegin: + +.. ipython:: python + Timestamp('2016-01-01') + SemiMonthBegin() + pd.date_range('2015-01-01', freq='SMS', periods=4) + +Using the anchoring suffix, you can also specify the day of month to use instead of the 15th. + +.. ipython:: python + + pd.date_range('2015-01-01', freq='SMS-16', periods=4) + + pd.date_range('2015-01-01', freq='SM-14', periods=4) .. _whatsnew_0190.enhancements.other: Other enhancements ^^^^^^^^^^^^^^^^^^ +- The ``.tz_localize()`` method of ``DatetimeIndex`` and ``Timestamp`` has gained the ``errors`` keyword, so you can potentially coerce nonexistent timestamps to ``NaT``. The default behaviour remains to raising a ``NonExistentTimeError`` (:issue:`13057`) + +- ``Index`` now supports ``.str.extractall()`` which returns a ``DataFrame``, see :ref:`documentation here ` (:issue:`10008`, :issue:`13156`) +- ``.to_hdf/read_hdf()`` now accept path objects (e.g. ``pathlib.Path``, ``py.path.local``) for the file path (:issue:`11773`) + + .. ipython:: python + + idx = pd.Index(["a1a2", "b1", "c1"]) + idx.str.extractall("[ab](?P\d)") + +- ``Timestamp`` s can now accept positional and keyword parameters like :func:`datetime.datetime` (:issue:`10758`, :issue:`11630`) + + .. ipython:: python + pd.Timestamp(2012, 1, 1) + pd.Timestamp(year=2012, month=1, day=1, hour=8, minute=30) +- The ``pd.read_csv()`` with ``engine='python'`` has gained support for the ``decimal`` option (:issue:`12933`) +- The ``pd.read_csv()`` with ``engine='python'`` has gained support for the ``na_filter`` option (:issue:`13321`) +- The ``pd.read_csv()`` with ``engine='python'`` has gained support for the ``memory_map`` option (:issue:`13381`) +- ``Index.astype()`` now accepts an optional boolean argument ``copy``, which allows optional copying if the requirements on dtype are satisfied (:issue:`13209`) +- ``Index`` now supports the ``.where()`` function for same shape indexing (:issue:`13170`) -.. _whatsnew_0190.api_breaking: + .. ipython:: python -Backwards incompatible API changes -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + idx = pd.Index(['a', 'b', 'c']) + idx.where([True, False, True]) + +- ``Categorical.astype()`` now accepts an optional boolean argument ``copy``, effective when dtype is categorical (:issue:`13209`) +- ``DataFrame`` has gained the ``.asof()`` method to return the last non-NaN values according to the selected subset (:issue:`13358`) +- Consistent with the Python API, ``pd.read_csv()`` will now interpret ``+inf`` as positive infinity (:issue:`13274`) +- The ``DataFrame`` constructor will now respect key ordering if a list of ``OrderedDict`` objects are passed in (:issue:`13304`) +- ``pd.read_html()`` has gained support for the ``decimal`` option (:issue:`12907`) +- A ``union_categorical`` function has been added for combining categoricals, see :ref:`Unioning Categoricals` (:issue:`13361`) +- ``eval``'s upcasting rules for ``float32`` types have been updated to be more consistent with NumPy's rules. New behavior will not upcast to ``float64`` if you multiply a pandas ``float32`` object by a scalar float64. (:issue:`12388`) +- ``Series`` has gained the properties ``.is_monotonic``, ``.is_monotonic_increasing``, ``.is_monotonic_decreasing``, similar to ``Index`` (:issue:`13336`) .. _whatsnew_0190.api: +API changes +~~~~~~~~~~~ +- Non-convertible dates in an excel date column will be returned without conversion and the column will be ``object`` dtype, rather than raising an exception (:issue:`10001`) +- An ``UnsupportedFunctionCall`` error is now raised if NumPy ufuncs like ``np.mean`` are called on groupby or resample objects (:issue:`12811`) +- Calls to ``.sample()`` will respect the random seed set via ``numpy.random.seed(n)`` (:issue:`13161`) +- ``Styler.apply`` is now more strict about the outputs your function must return. For ``axis=0`` or ``axis=1``, the output shape must be identical. For ``axis=None``, the output must be a DataFrame with identical columns and index labels. (:issue:`13222`) +.. _whatsnew_0190.api.tolist: +``Series.tolist()`` will now return Python types +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -Other API Changes -^^^^^^^^^^^^^^^^^ +``Series.tolist()`` will now return Python types in the output, mimicking NumPy ``.tolist()`` behaviour (:issue:`10904`) -.. _whatsnew_0190.deprecations: -Deprecations -^^^^^^^^^^^^ +.. ipython:: python + + s = pd.Series([1,2,3]) + type(s.tolist()[0]) + +Previous Behavior: + +.. code-block:: ipython + + In [7]: type(s.tolist()[0]) + Out[7]: + + +New Behavior: + +.. ipython:: python + + type(s.tolist()[0]) + +.. _whatsnew_0190.api.promote: + +``Series`` type promotion on assignment +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +A ``Series`` will now correctly promote its dtype for assignment with incompat values to the current dtype (:issue:`13234`) + + +.. ipython:: python + + s = pd.Series() +Previous Behavior: +.. code-block:: ipython + In [2]: s["a"] = pd.Timestamp("2016-01-01") + In [3]: s["b"] = 3.0 + TypeError: invalid type promotion -.. _whatsnew_0190.prior_deprecations: +New Behavior: -Removal of prior version deprecations/changes -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +.. ipython:: python + s["a"] = pd.Timestamp("2016-01-01") + s["b"] = 3.0 + s + s.dtype +.. _whatsnew_0190.api.to_datetime_coerce: +``.to_datetime()`` when coercing +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +A bug is fixed in ``.to_datetime()`` when passing integers or floats, and no ``unit`` and ``errors='coerce'`` (:issue:`13180`). +Previously if ``.to_datetime()`` encountered mixed integers/floats and strings, but no datetimes with ``errors='coerce'`` it would convert all to ``NaT``. + +Previous Behavior: + +.. code-block:: ipython + + In [2]: pd.to_datetime([1, 'foo'], errors='coerce') + Out[2]: DatetimeIndex(['NaT', 'NaT'], dtype='datetime64[ns]', freq=None) + +This will now convert integers/floats with the default unit of ``ns``. + +.. ipython:: python + + pd.to_datetime([1, 'foo'], errors='coerce') + +.. _whatsnew_0190.api.merging: + +Merging changes +^^^^^^^^^^^^^^^ + +Merging will now preserve the dtype of the join keys (:issue:`8596`) + +.. ipython:: python + + df1 = pd.DataFrame({'key': [1], 'v1': [10]}) + df1 + df2 = pd.DataFrame({'key': [1, 2], 'v1': [20, 30]}) + df2 + +Previous Behavior: + +.. code-block:: ipython + + In [5]: pd.merge(df1, df2, how='outer') + Out[5]: + key v1 + 0 1.0 10.0 + 1 1.0 20.0 + 2 2.0 30.0 + + In [6]: pd.merge(df1, df2, how='outer').dtypes + Out[6]: + key float64 + v1 float64 + dtype: object + +New Behavior: + +We are able to preserve the join keys + +.. ipython:: python + + pd.merge(df1, df2, how='outer') + pd.merge(df1, df2, how='outer').dtypes + +Of course if you have missing values that are introduced, then the +resulting dtype will be upcast (unchanged from previous). + +.. ipython:: python + + pd.merge(df1, df2, how='outer', on='key') + pd.merge(df1, df2, how='outer', on='key').dtypes + +.. _whatsnew_0190.describe: + +``.describe()`` changes +^^^^^^^^^^^^^^^^^^^^^^^ + +Percentile identifiers in the index of a ``.describe()`` output will now be rounded to the least precision that keeps them distinct (:issue:`13104`) + +.. ipython:: python + + s = pd.Series([0, 1, 2, 3, 4]) + df = pd.DataFrame([0, 1, 2, 3, 4]) + +Previous Behavior: + +The percentiles were rounded to at most one decimal place, which could raise ``ValueError`` for a data frame if the percentiles were duplicated. + +.. code-block:: ipython + + In [3]: s.describe(percentiles=[0.0001, 0.0005, 0.001, 0.999, 0.9995, 0.9999]) + Out[3]: + count 5.000000 + mean 2.000000 + std 1.581139 + min 0.000000 + 0.0% 0.000400 + 0.1% 0.002000 + 0.1% 0.004000 + 50% 2.000000 + 99.9% 3.996000 + 100.0% 3.998000 + 100.0% 3.999600 + max 4.000000 + dtype: float64 + + In [4]: df.describe(percentiles=[0.0001, 0.0005, 0.001, 0.999, 0.9995, 0.9999]) + Out[4]: + ... + ValueError: cannot reindex from a duplicate axis + +New Behavior: + +.. ipython:: python + + s.describe(percentiles=[0.0001, 0.0005, 0.001, 0.999, 0.9995, 0.9999]) + df.describe(percentiles=[0.0001, 0.0005, 0.001, 0.999, 0.9995, 0.9999]) + +Furthermore: + +- Passing duplicated ``percentiles`` will now raise a ``ValueError``. +- Bug in ``.describe()`` on a DataFrame with a mixed-dtype column index, which would previously raise a ``TypeError`` (:issue:`13288`) + +.. _whatsnew_0190.api.other: + +Other API changes +^^^^^^^^^^^^^^^^^ + +- ``Float64Index.astype(int)`` will now raise ``ValueError`` if ``Float64Index`` contains ``NaN`` values (:issue:`13149`) +- ``TimedeltaIndex.astype(int)`` and ``DatetimeIndex.astype(int)`` will now return ``Int64Index`` instead of ``np.array`` (:issue:`13209`) +- ``.filter()`` enforces mutual exclusion of the keyword arguments. (:issue:`12399`) +- ``PeridIndex`` can now accept ``list`` and ``array`` which contains ``pd.NaT`` (:issue:`13430`) +- ``__setitem__`` will no longer apply a callable rhs as a function instead of storing it. Call ``where`` directly to get the previous behavior. (:issue:`13299`) + +.. _whatsnew_0190.deprecations: + +Deprecations +^^^^^^^^^^^^ + +- ``compact_ints`` and ``use_unsigned`` have been deprecated in ``pd.read_csv()`` and will be removed in a future version (:issue:`13320`) +- ``buffer_lines`` has been deprecated in ``pd.read_csv()`` and will be removed in a future version (:issue:`13360`) +- ``as_recarray`` has been deprecated in ``pd.read_csv()`` and will be removed in a future version (:issue:`13373`) +- top-level ``pd.ordered_merge()`` has been renamed to ``pd.merge_ordered()`` and the original name will be removed in a future version (:issue:`13358`) .. _whatsnew_0190.performance: Performance Improvements ~~~~~~~~~~~~~~~~~~~~~~~~ +- Improved performance of sparse ``IntIndex.intersect`` (:issue:`13082`) +- Improved performance of sparse arithmetic with ``BlockIndex`` when the number of blocks are large, though recommended to use ``IntIndex`` in such cases (:issue:`13082`) +- increased performance of ``DataFrame.quantile()`` as it now operates per-block (:issue:`11623`) - +- Improved performance of float64 hash table operations, fixing some very slow indexing and groupby operations in python 3 (:issue:`13166`, :issue:`13334`) +- Improved performance of ``DataFrameGroupBy.transform`` (:issue:`12737`) .. _whatsnew_0190.bug_fixes: Bug Fixes ~~~~~~~~~ + +- Bug in ``io.json.json_normalize()``, where non-ascii keys raised an exception (:issue:`13213`) +- Bug in ``SparseSeries`` with ``MultiIndex`` ``[]`` indexing may raise ``IndexError`` (:issue:`13144`) +- Bug in ``SparseSeries`` with ``MultiIndex`` ``[]`` indexing result may have normal ``Index`` (:issue:`13144`) +- Bug in ``SparseDataFrame`` in which ``axis=None`` did not default to ``axis=0`` (:issue:`13048`) +- Bug in ``SparseSeries`` and ``SparseDataFrame`` creation with ``object`` dtype may raise ``TypeError`` (:issue:`11633`) +- Bug when passing a not-default-indexed ``Series`` as ``xerr`` or ``yerr`` in ``.plot()`` (:issue:`11858`) +- Bug in matplotlib ``AutoDataFormatter``; this restores the second scaled formatting and re-adds micro-second scaled formatting (:issue:`13131`) +- Bug in selection from a ``HDFStore`` with a fixed format and ``start`` and/or ``stop`` specified will now return the selected range (:issue:`8287`) + + +- Bug in ``.groupby(..).resample(..)`` when the same object is called multiple times (:issue:`13174`) +- Bug in ``.to_records()`` when index name is a unicode string (:issue:`13172`) + +- Bug in calling ``.memory_usage()`` on object which doesn't implement (:issue:`12924`) + +- Regression in ``Series.quantile`` with nans (also shows up in ``.median()`` and ``.describe()`` ); furthermore now names the ``Series`` with the quantile (:issue:`13098`, :issue:`13146`) + +- Bug in ``SeriesGroupBy.transform`` with datetime values and missing groups (:issue:`13191`) + +- Bug in ``Series.str.extractall()`` with ``str`` index raises ``ValueError`` (:issue:`13156`) +- Bug in ``Series.str.extractall()`` with single group and quantifier (:issue:`13382`) + + +- Bug in ``PeriodIndex`` and ``Period`` subtraction raises ``AttributeError`` (:issue:`13071`) +- Bug in ``PeriodIndex`` construction returning a ``float64`` index in some circumstances (:issue:`13067`) +- Bug in ``.resample(..)`` with a ``PeriodIndex`` not changing its ``freq`` appropriately when empty (:issue:`13067`) +- Bug in ``.resample(..)`` with a ``PeriodIndex`` not retaining its type or name with an empty ``DataFrame`` appropriately when empty (:issue:`13212`) +- Bug in ``groupby(..).resample(..)`` where passing some keywords would raise an exception (:issue:`13235`) +- Bug in ``.tz_convert`` on a tz-aware ``DateTimeIndex`` that relied on index being sorted for correct results (:issue:`13306`) +- Bug in ``pd.read_hdf()`` where attempting to load an HDF file with a single dataset, that had one or more categorical columns, failed unless the key argument was set to the name of the dataset. (:issue:`13231`) +- Bug in ``.rolling()`` that allowed a negative integer window in contruction of the ``Rolling()`` object, but would later fail on aggregation (:issue:`13383`) + +- Bug in various index types, which did not propagate the name of passed index (:issue:`12309`) +- Bug in ``DatetimeIndex``, which did not honour the ``copy=True`` (:issue:`13205`) +- Bug in ``DatetimeIndex.is_normalized`` returns incorrectly for normalized date_range in case of local timezones (:issue:`13459`) + +- Bug in ``DataFrame.to_csv()`` in which float values were being quoted even though quotations were specified for non-numeric values only (:issue:`12922`, :issue:`13259`) +- Bug in ``MultiIndex`` slicing where extra elements were returned when level is non-unique (:issue:`12896`) +- Bug in ``.str.replace`` does not raise ``TypeError`` for invalid replacement (:issue:`13438`) + + +- Bug in ``pd.read_csv()`` with ``engine='python'`` in which ``NaN`` values weren't being detected after data was converted to numeric values (:issue:`13314`) +- Bug in ``pd.read_csv()`` in which the ``nrows`` argument was not properly validated for both engines (:issue:`10476`) +- Bug in ``pd.read_csv()`` with ``engine='python'`` in which infinities of mixed-case forms were not being interpreted properly (:issue:`13274`) +- Bug in ``pd.read_csv()`` with ``engine='python'`` in which trailing ``NaN`` values were not being parsed (:issue:`13320`) +- Bug in ``pd.read_csv()`` with ``engine='python'`` when reading from a tempfile.TemporaryFile on Windows with Python 3 (:issue:`13398`) +- Bug in ``pd.read_csv()`` that prevents ``usecols`` kwarg from accepting single-byte unicode strings (:issue:`13219`) +- Bug in ``pd.read_csv()`` that prevents ``usecols`` from being an empty set (:issue:`13402`) +- Bug in ``pd.read_csv()`` with ``engine=='c'`` in which null ``quotechar`` was not accepted even though ``quoting`` was specified as ``None`` (:issue:`13411`) +- Bug in ``pd.read_csv()`` with ``engine=='c'`` in which fields were not properly cast to float when quoting was specified as non-numeric (:issue:`13411`) +- Bug in ``pd.pivot_table()`` where ``margins_name`` is ignored when ``aggfunc`` is a list (:issue:`13354`) + + + +- Bug in ``Series`` arithmetic raises ``TypeError`` if it contains datetime-like as ``object`` dtype (:issue:`13043`) + + +- Bug in ``pd.to_datetime()`` when passing invalid datatypes (e.g. bool); will now respect the ``errors`` keyword (:issue:`13176`) +- Bug in ``pd.to_datetime()`` which overflowed on ``int8``, `int16`` dtypes (:issue:`13451`) +- Bug in extension dtype creation where the created types were not is/identical (:issue:`13285`) + +- Bug in ``NaT`` - ``Period`` raises ``AttributeError`` (:issue:`13071`) +- Bug in ``Period`` addition raises ``TypeError`` if ``Period`` is on right hand side (:issue:`13069`) +- Bug in ``Peirod`` and ``Series`` or ``Index`` comparison raises ``TypeError`` (:issue:`13200`) +- Bug in ``pd.set_eng_float_format()`` that would prevent NaN's from formatting (:issue:`11981`) +- Bug in ``.unstack`` with ``Categorical`` dtype resets ``.ordered`` to ``True`` (:issue:`13249`) + + +- Bug in ``Series`` comparison operators when dealing with zero dim NumPy arrays (:issue:`13006`) +- Bug in ``groupby`` where ``apply`` returns different result depending on whether first result is ``None`` or not (:issue:`12824`) +- Bug in ``groupby(..).nth()`` where the group key is included inconsistently if called after ``.head()/.tail()`` (:issue:`12839`) + +- Bug in ``pd.to_numeric`` when ``errors='coerce'`` and input contains non-hashable objects (:issue:`13324`) + + +- Bug in ``Categorical.remove_unused_categories()`` changes ``.codes`` dtype to platform int (:issue:`13261`) +- Bug in ``groupby`` with ``as_index=False`` returns all NaN's when grouping on multiple columns including a categorical one (:issue:`13204`) + +- Bug where ``pd.read_gbq()`` could throw ``ImportError: No module named discovery`` as a result of a naming conflict with another python package called apiclient (:issue:`13454`) diff --git a/doc/source/whatsnew/v0.20.0.txt b/doc/source/whatsnew/v0.20.0.txt new file mode 100644 index 0000000000000..695e917c76ba0 --- /dev/null +++ b/doc/source/whatsnew/v0.20.0.txt @@ -0,0 +1,83 @@ +.. _whatsnew_0200: + +v0.20.0 (????, 2016) +-------------------- + +This is a major release from 0.19 and includes a small number of API changes, several new features, +enhancements, and performance improvements along with a large number of bug fixes. We recommend that all +users upgrade to this version. + +Highlights include: + + +Check the :ref:`API Changes ` and :ref:`deprecations ` before updating. + +.. contents:: What's new in v0.19.0 + :local: + :backlinks: none + +.. _whatsnew_0200.enhancements: + +New features +~~~~~~~~~~~~ + + + + + +.. _whatsnew_0200.enhancements.other: + +Other enhancements +^^^^^^^^^^^^^^^^^^ + + + + + + +.. _whatsnew_0200.api_breaking: + +Backwards incompatible API changes +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +.. _whatsnew_0200.api: + + + + + + +Other API Changes +^^^^^^^^^^^^^^^^^ + +.. _whatsnew_0200.deprecations: + +Deprecations +^^^^^^^^^^^^ + + + + + +.. _whatsnew_0200.prior_deprecations: + +Removal of prior version deprecations/changes +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + + + + + +.. _whatsnew_0200.performance: + +Performance Improvements +~~~~~~~~~~~~~~~~~~~~~~~~ + + + + + +.. _whatsnew_0200.bug_fixes: + +Bug Fixes +~~~~~~~~~ diff --git a/pandas/computation/ops.py b/pandas/computation/ops.py index bf6fa35cf255f..7a0743f6b2778 100644 --- a/pandas/computation/ops.py +++ b/pandas/computation/ops.py @@ -286,7 +286,7 @@ def _cast_inplace(terms, acceptable_dtypes, dtype): acceptable_dtypes : list of acceptable numpy.dtype Will not cast if term's dtype in this list. - .. versionadded:: 0.18.2 + .. versionadded:: 0.19.0 dtype : str or numpy.dtype The dtype to cast to. diff --git a/pandas/core/base.py b/pandas/core/base.py index 96732a7140f9e..13a6b4b7b4ce0 100644 --- a/pandas/core/base.py +++ b/pandas/core/base.py @@ -1001,7 +1001,7 @@ def is_monotonic(self): Return boolean if values in the object are monotonic_increasing - .. versionadded:: 0.18.2 + .. versionadded:: 0.19.0 Returns ------- @@ -1017,7 +1017,7 @@ def is_monotonic_decreasing(self): Return boolean if values in the object are monotonic_decreasing - .. versionadded:: 0.18.2 + .. versionadded:: 0.19.0 Returns ------- diff --git a/pandas/core/categorical.py b/pandas/core/categorical.py index 6dba41a746e19..f4aeaf9184d09 100644 --- a/pandas/core/categorical.py +++ b/pandas/core/categorical.py @@ -348,7 +348,7 @@ def astype(self, dtype, copy=True): If copy is set to False and dtype is categorical, the original object is returned. - .. versionadded:: 0.18.2 + .. versionadded:: 0.19.0 """ if is_categorical_dtype(dtype): diff --git a/pandas/core/generic.py b/pandas/core/generic.py index cc5c45158bf4f..7b271df4085cc 100644 --- a/pandas/core/generic.py +++ b/pandas/core/generic.py @@ -3642,7 +3642,7 @@ def asof(self, where, subset=None): The last row without any NaN is taken (or the last row without NaN considering only the subset of columns in the case of a DataFrame) - .. versionadded:: 0.18.2 For DataFrame + .. versionadded:: 0.19.0 For DataFrame If there is no good value, NaN is returned. diff --git a/pandas/indexes/base.py b/pandas/indexes/base.py index 96472698ba9d9..ad27010714f63 100644 --- a/pandas/indexes/base.py +++ b/pandas/indexes/base.py @@ -378,7 +378,7 @@ def _shallow_copy_with_infer(self, values=None, **kwargs): def _deepcopy_if_needed(self, orig, copy=False): """ - .. versionadded:: 0.18.2 + .. versionadded:: 0.19.0 Make a copy of self if data coincides (in memory) with orig. Subclasses should override this if self._base is not an ndarray. @@ -494,7 +494,7 @@ def repeat(self, n, *args, **kwargs): def where(self, cond, other=None): """ - .. versionadded:: 0.18.2 + .. versionadded:: 0.19.0 Return an Index of same shape as self and whose corresponding entries are from self where cond is True and otherwise are from @@ -813,7 +813,7 @@ def _to_embed(self, keep_tz=False): satisfied, the original data is used to create a new Index or the original Index is returned. - .. versionadded:: 0.18.2 + .. versionadded:: 0.19.0 """ diff --git a/pandas/indexes/category.py b/pandas/indexes/category.py index 3b7c660f5faa1..84b8926f4177f 100644 --- a/pandas/indexes/category.py +++ b/pandas/indexes/category.py @@ -313,7 +313,7 @@ def _can_reindex(self, indexer): def where(self, cond, other=None): """ - .. versionadded:: 0.18.2 + .. versionadded:: 0.19.0 Return an Index of same shape as self and whose corresponding entries are from self where cond is True and otherwise are from diff --git a/pandas/io/html.py b/pandas/io/html.py index 48caaa39dd711..609642e248eda 100644 --- a/pandas/io/html.py +++ b/pandas/io/html.py @@ -837,7 +837,7 @@ def read_html(io, match='.+', flavor=None, header=None, index_col=None, Character to recognize as decimal point (e.g. use ',' for European data). - .. versionadded:: 0.18.2 + .. versionadded:: 0.19.0 Returns ------- diff --git a/pandas/io/pytables.py b/pandas/io/pytables.py index cbe04349b5105..d4ca717ddbc4e 100644 --- a/pandas/io/pytables.py +++ b/pandas/io/pytables.py @@ -276,7 +276,7 @@ def read_hdf(path_or_buf, key=None, **kwargs): path_or_buf : path (string), buffer, or path object (pathlib.Path or py._path.local.LocalPath) to read from - .. versionadded:: 0.18.2 support for pathlib, py.path. + .. versionadded:: 0.19.0 support for pathlib, py.path. key : group identifier in the store. Can be omitted a HDF file contains a single pandas object. diff --git a/pandas/tools/merge.py b/pandas/tools/merge.py index 4b7162398738e..d65dfc3254465 100644 --- a/pandas/tools/merge.py +++ b/pandas/tools/merge.py @@ -182,7 +182,7 @@ def merge_ordered(left, right, on=None, * outer: use union of keys from both frames (SQL: full outer join) * inner: use intersection of keys from both frames (SQL: inner join) - .. versionadded 0.18.2 + .. versionadded:: 0.19.0 Examples -------- @@ -263,7 +263,7 @@ def merge_asof(left, right, on=None, Optionally perform group-wise merge. This searches for the nearest match on the 'on' key within the same group according to 'by'. - .. versionadded 0.18.2 + .. versionadded:: 0.19.0 Parameters ---------- diff --git a/pandas/tseries/base.py b/pandas/tseries/base.py index 42631d442a990..2e3d1ace9734c 100644 --- a/pandas/tseries/base.py +++ b/pandas/tseries/base.py @@ -747,7 +747,7 @@ def repeat(self, repeats, *args, **kwargs): def where(self, cond, other=None): """ - .. versionadded:: 0.18.2 + .. versionadded:: 0.19.0 Return an Index of same shape as self and whose corresponding entries are from self where cond is True and otherwise are from diff --git a/pandas/tseries/index.py b/pandas/tseries/index.py index 77500081be62c..83cb768b37aaa 100644 --- a/pandas/tseries/index.py +++ b/pandas/tseries/index.py @@ -1857,7 +1857,7 @@ def tz_localize(self, tz, ambiguous='raise', errors='raise'): - 'coerce' will return NaT if the timestamp can not be converted into the specified timezone - .. versionadded:: 0.18.2 + .. versionadded:: 0.19.0 infer_dst : boolean, default False (DEPRECATED) Attempt to infer fall dst-transition hours based on order diff --git a/pandas/tseries/offsets.py b/pandas/tseries/offsets.py index f4b75ddd72126..d0b1fd746d0d5 100644 --- a/pandas/tseries/offsets.py +++ b/pandas/tseries/offsets.py @@ -1258,7 +1258,7 @@ class SemiMonthEnd(SemiMonthOffset): Two DateOffset's per month repeating on the last day of the month and day_of_month. - .. versionadded:: 0.18.2 + .. versionadded:: 0.19.0 Parameters ---------- @@ -1317,7 +1317,7 @@ class SemiMonthBegin(SemiMonthOffset): Two DateOffset's per month repeating on the first day of the month and day_of_month. - .. versionadded:: 0.18.2 + .. versionadded:: 0.19.0 Parameters ---------- diff --git a/pandas/tslib.pyx b/pandas/tslib.pyx index 8837881af0b6c..df6554fe1d5de 100644 --- a/pandas/tslib.pyx +++ b/pandas/tslib.pyx @@ -246,7 +246,7 @@ class Timestamp(_Timestamp): :func:`datetime.datetime` Parameters ------------------------------------ - .. versionadded:: 0.18.2 + .. versionadded:: 0.19.0 year : int month : int @@ -539,7 +539,7 @@ class Timestamp(_Timestamp): - 'coerce' will return NaT if the timestamp can not be converted into the specified timezone - .. versionadded:: 0.18.2 + .. versionadded:: 0.19.0 Returns ------- diff --git a/pandas/types/concat.py b/pandas/types/concat.py index 53db9ddf79a5c..44338f26eb2e8 100644 --- a/pandas/types/concat.py +++ b/pandas/types/concat.py @@ -206,7 +206,7 @@ def union_categoricals(to_union): Combine list-like of Categoricals, unioning categories. All must have the same dtype, and none can be ordered. - .. versionadded 0.18.2 + .. versionadded:: 0.19.0 Parameters ----------