Skip to content

Commit ddf42c7

Browse files
committed
Merge remote-tracking branch 'upstream/master' into to_html-to_string
* upstream/master: BUG: to_html misses truncation indicators (...) when index=False (pandas-dev#22786) API/DEPR: replace "raise_conflict" with "errors" for df.update (pandas-dev#23657) BUG: Append DataFrame to Series with dateutil timezone (pandas-dev#23685) CLN/CI: Catch that stderr-warning! (pandas-dev#23706) ENH: Allow for join between two multi-index dataframe instances (pandas-dev#20356) Ensure Index._data is an ndarray (pandas-dev#23628) DOC: flake8-per-pr for windows users (pandas-dev#23707) DOC: Handle exceptions when computing contributors. (pandas-dev#23714) DOC: Validate space before colon docstring parameters pandas-dev#23483 (pandas-dev#23506) BUG-22984 Fix truncation of DataFrame representations (pandas-dev#22987)
2 parents 2bb90d4 + 8af7637 commit ddf42c7

24 files changed

+1193
-662
lines changed

doc/source/contributing.rst

+6-13
Original file line numberDiff line numberDiff line change
@@ -591,21 +591,14 @@ run this slightly modified command::
591591

592592
git diff master --name-only -- "*.py" | grep "pandas/" | xargs flake8
593593

594-
Note that on Windows, these commands are unfortunately not possible because
595-
commands like ``grep`` and ``xargs`` are not available natively. To imitate the
596-
behavior with the commands above, you should run::
594+
Windows does not support the ``grep`` and ``xargs`` commands (unless installed
595+
for example via the `MinGW <http://www.mingw.org/>`__ toolchain), but one can
596+
imitate the behaviour as follows::
597597

598-
git diff master --name-only -- "*.py"
598+
for /f %i in ('git diff upstream/master --name-only ^| findstr pandas/') do flake8 %i
599599

600-
This will list all of the Python files that have been modified. The only ones
601-
that matter during linting are any whose directory filepath begins with "pandas."
602-
For each filepath, copy and paste it after the ``flake8`` command as shown below:
603-
604-
flake8 <python-filepath>
605-
606-
Alternatively, you can install the ``grep`` and ``xargs`` commands via the
607-
`MinGW <http://www.mingw.org/>`__ toolchain, and it will allow you to run the
608-
commands above.
600+
This will also get all the files being changed by the PR (and within the
601+
``pandas/`` folder), and run ``flake8`` on them one after the other.
609602

610603
.. _contributing.import-formatting:
611604

doc/source/whatsnew/v0.24.0.rst

+46-1
Original file line numberDiff line numberDiff line change
@@ -184,6 +184,47 @@ array, but rather an ``ExtensionArray``:
184184
This is the same behavior as ``Series.values`` for categorical data. See
185185
:ref:`whatsnew_0240.api_breaking.interval_values` for more.
186186

187+
.. _whatsnew_0240.enhancements.join_with_two_multiindexes:
188+
189+
Joining with two multi-indexes
190+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
191+
192+
:func:`Datafame.merge` and :func:`Dataframe.join` can now be used to join multi-indexed ``Dataframe`` instances on the overlaping index levels (:issue:`6360`)
193+
194+
See the :ref:`Merge, join, and concatenate
195+
<merging.Join_with_two_multi_indexes>` documentation section.
196+
197+
.. ipython:: python
198+
199+
index_left = pd.MultiIndex.from_tuples([('K0', 'X0'), ('K0', 'X1'),
200+
('K1', 'X2')],
201+
names=['key', 'X'])
202+
203+
204+
left = pd.DataFrame({'A': ['A0', 'A1', 'A2'],
205+
'B': ['B0', 'B1', 'B2']},
206+
index=index_left)
207+
208+
209+
index_right = pd.MultiIndex.from_tuples([('K0', 'Y0'), ('K1', 'Y1'),
210+
('K2', 'Y2'), ('K2', 'Y3')],
211+
names=['key', 'Y'])
212+
213+
214+
right = pd.DataFrame({'C': ['C0', 'C1', 'C2', 'C3'],
215+
'D': ['D0', 'D1', 'D2', 'D3']},
216+
index=index_right)
217+
218+
219+
left.join(right)
220+
221+
For earlier versions this can be done using the following.
222+
223+
.. ipython:: python
224+
225+
pd.merge(left.reset_index(), right.reset_index(),
226+
on=['key'], how='inner').set_index(['key', 'X', 'Y'])
227+
187228
.. _whatsnew_0240.enhancements.rename_axis:
188229

189230
Renaming names in a MultiIndex
@@ -983,6 +1024,7 @@ Deprecations
9831024
- The ``fastpath`` keyword of the different Index constructors is deprecated (:issue:`23110`).
9841025
- :meth:`Timestamp.tz_localize`, :meth:`DatetimeIndex.tz_localize`, and :meth:`Series.tz_localize` have deprecated the ``errors`` argument in favor of the ``nonexistent`` argument (:issue:`8917`)
9851026
- The class ``FrozenNDArray`` has been deprecated. When unpickling, ``FrozenNDArray`` will be unpickled to ``np.ndarray`` once this class is removed (:issue:`9031`)
1027+
- The methods :meth:`DataFrame.update` and :meth:`Panel.update` have deprecated the ``raise_conflict=False|True`` keyword in favor of ``errors='ignore'|'raise'`` (:issue:`23585`)
9861028
- Deprecated the `nthreads` keyword of :func:`pandas.read_feather` in favor of
9871029
`use_threads` to reflect the changes in pyarrow 0.11.0. (:issue:`23053`)
9881030
- :func:`pandas.read_excel` has deprecated accepting ``usecols`` as an integer. Please pass in a list of ints from 0 to ``usecols`` inclusive instead (:issue:`23527`)
@@ -1321,7 +1363,9 @@ Notice how we now instead output ``np.nan`` itself instead of a stringified form
13211363
- :func:`read_sas()` will correctly parse sas7bdat files with many columns (:issue:`22628`)
13221364
- :func:`read_sas()` will correctly parse sas7bdat files with data page types having also bit 7 set (so page type is 128 + 256 = 384) (:issue:`16615`)
13231365
- Bug in :meth:`detect_client_encoding` where potential ``IOError`` goes unhandled when importing in a mod_wsgi process due to restricted access to stdout. (:issue:`21552`)
1324-
- Bug in :func:`to_string()` that broke column alignment when ``index=False`` and width of first column's values is greater than the width of first column's header (:issue:`16839`, :issue:`13032`)
1366+
- Bug in :func:`to_html()` with ``index=False`` misses truncation indicators (...) on truncated DataFrame (:issue:`15019`, :issue:`22783`)
1367+
- Bug in :func:`DataFrame.to_string()` that broke column alignment when ``index=False`` and width of first column's values is greater than the width of first column's header (:issue:`16839`, :issue:`13032`)
1368+
- Bug in :func:`DataFrame.to_string()` that caused representations of :class:`DataFrame` to not take up the whole window (:issue:`22984`)
13251369
- Bug in :func:`DataFrame.to_csv` where a single level MultiIndex incorrectly wrote a tuple. Now just the value of the index is written (:issue:`19589`).
13261370
- Bug in :meth:`HDFStore.append` when appending a :class:`DataFrame` with an empty string column and ``min_itemsize`` < 8 (:issue:`12242`)
13271371
- Bug in :meth:`read_csv()` in which :class:`MultiIndex` index names were being improperly handled in the cases when they were not provided (:issue:`23484`)
@@ -1374,6 +1418,7 @@ Reshaping
13741418
- Bug in :func:`pandas.concat` when concatenating a multicolumn DataFrame with tz-aware data against a DataFrame with a different number of columns (:issue:`22796`)
13751419
- Bug in :func:`merge_asof` where confusing error message raised when attempting to merge with missing values (:issue:`23189`)
13761420
- Bug in :meth:`DataFrame.nsmallest` and :meth:`DataFrame.nlargest` for dataframes that have a :class:`MultiIndex` for columns (:issue:`23033`).
1421+
- Bug in :meth:`DataFrame.append` with a :class:`Series` with a dateutil timezone would raise a ``TypeError`` (:issue:`23682`)
13771422

13781423
.. _whatsnew_0240.bug_fixes.sparse:
13791424

doc/sphinxext/contributors.py

+20-11
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,7 @@
1010
"""
1111
from docutils import nodes
1212
from docutils.parsers.rst import Directive
13+
import git
1314

1415
from announce import build_components
1516

@@ -19,17 +20,25 @@ class ContributorsDirective(Directive):
1920
name = 'contributors'
2021

2122
def run(self):
22-
components = build_components(self.arguments[0])
23-
24-
message = nodes.paragraph()
25-
message += nodes.Text(components['author_message'])
26-
27-
listnode = nodes.bullet_list()
28-
29-
for author in components['authors']:
30-
para = nodes.paragraph()
31-
para += nodes.Text(author)
32-
listnode += nodes.list_item('', para)
23+
range_ = self.arguments[0]
24+
try:
25+
components = build_components(range_)
26+
except git.GitCommandError:
27+
return [
28+
self.state.document.reporter.warning(
29+
"Cannot find contributors for range '{}'".format(range_),
30+
line=self.lineno)
31+
]
32+
else:
33+
message = nodes.paragraph()
34+
message += nodes.Text(components['author_message'])
35+
36+
listnode = nodes.bullet_list()
37+
38+
for author in components['authors']:
39+
para = nodes.paragraph()
40+
para += nodes.Text(author)
41+
listnode += nodes.list_item('', para)
3342

3443
return [message, listnode]
3544

pandas/_libs/lib.pyx

+12-11
Original file line numberDiff line numberDiff line change
@@ -48,8 +48,7 @@ cdef extern from "src/parse_helper.h":
4848
int floatify(object, float64_t *result, int *maybe_int) except -1
4949

5050
cimport util
51-
from util cimport (is_nan,
52-
UINT8_MAX, UINT64_MAX, INT64_MAX, INT64_MIN)
51+
from util cimport is_nan, UINT64_MAX, INT64_MAX, INT64_MIN
5352

5453
from tslib import array_to_datetime
5554
from tslibs.nattype cimport NPY_NAT
@@ -1642,20 +1641,22 @@ def is_datetime_with_singletz_array(values: ndarray) -> bool:
16421641

16431642
if n == 0:
16441643
return False
1645-
1644+
# Get a reference timezone to compare with the rest of the tzs in the array
16461645
for i in range(n):
16471646
base_val = values[i]
16481647
if base_val is not NaT:
16491648
base_tz = get_timezone(getattr(base_val, 'tzinfo', None))
1650-
1651-
for j in range(i, n):
1652-
val = values[j]
1653-
if val is not NaT:
1654-
tz = getattr(val, 'tzinfo', None)
1655-
if not tz_compare(base_tz, tz):
1656-
return False
16571649
break
16581650

1651+
for j in range(i, n):
1652+
# Compare val's timezone with the reference timezone
1653+
# NaT can coexist with tz-aware datetimes, so skip if encountered
1654+
val = values[j]
1655+
if val is not NaT:
1656+
tz = getattr(val, 'tzinfo', None)
1657+
if not tz_compare(base_tz, tz):
1658+
return False
1659+
16591660
return True
16601661

16611662

@@ -2045,7 +2046,7 @@ def maybe_convert_objects(ndarray[object] objects, bint try_float=0,
20452046

20462047
# we try to coerce datetime w/tz but must all have the same tz
20472048
if seen.datetimetz_:
2048-
if len({getattr(val, 'tzinfo', None) for val in objects}) == 1:
2049+
if is_datetime_with_singletz_array(objects):
20492050
from pandas import DatetimeIndex
20502051
return DatetimeIndex(objects)
20512052
seen.object_ = 1

pandas/core/frame.py

+22-6
Original file line numberDiff line numberDiff line change
@@ -5203,8 +5203,10 @@ def combiner(x, y):
52035203

52045204
return self.combine(other, combiner, overwrite=False)
52055205

5206+
@deprecate_kwarg(old_arg_name='raise_conflict', new_arg_name='errors',
5207+
mapping={False: 'ignore', True: 'raise'})
52065208
def update(self, other, join='left', overwrite=True, filter_func=None,
5207-
raise_conflict=False):
5209+
errors='ignore'):
52085210
"""
52095211
Modify in place using non-NA values from another DataFrame.
52105212
@@ -5228,17 +5230,28 @@ def update(self, other, join='left', overwrite=True, filter_func=None,
52285230
* False: only update values that are NA in
52295231
the original DataFrame.
52305232
5231-
filter_func : callable(1d-array) -> boolean 1d-array, optional
5233+
filter_func : callable(1d-array) -> bool 1d-array, optional
52325234
Can choose to replace values other than NA. Return True for values
52335235
that should be updated.
5234-
raise_conflict : bool, default False
5235-
If True, will raise a ValueError if the DataFrame and `other`
5236+
errors : {'raise', 'ignore'}, default 'ignore'
5237+
If 'raise', will raise a ValueError if the DataFrame and `other`
52365238
both contain non-NA data in the same place.
52375239
5240+
.. versionchanged :: 0.24.0
5241+
Changed from `raise_conflict=False|True`
5242+
to `errors='ignore'|'raise'`.
5243+
5244+
Returns
5245+
-------
5246+
None : method directly changes calling object
5247+
52385248
Raises
52395249
------
52405250
ValueError
5241-
When `raise_conflict` is True and there's overlapping non-NA data.
5251+
* When `errors='raise'` and there's overlapping non-NA data.
5252+
* When `errors` is not either `'ignore'` or `'raise'`
5253+
NotImplementedError
5254+
* If `join != 'left'`
52425255
52435256
See Also
52445257
--------
@@ -5309,6 +5322,9 @@ def update(self, other, join='left', overwrite=True, filter_func=None,
53095322
# TODO: Support other joins
53105323
if join != 'left': # pragma: no cover
53115324
raise NotImplementedError("Only left join is supported")
5325+
if errors not in ['ignore', 'raise']:
5326+
raise ValueError("The parameter errors must be either "
5327+
"'ignore' or 'raise'")
53125328

53135329
if not isinstance(other, DataFrame):
53145330
other = DataFrame(other)
@@ -5322,7 +5338,7 @@ def update(self, other, join='left', overwrite=True, filter_func=None,
53225338
with np.errstate(all='ignore'):
53235339
mask = ~filter_func(this) | isna(that)
53245340
else:
5325-
if raise_conflict:
5341+
if errors == 'raise':
53265342
mask_this = notna(that)
53275343
mask_that = notna(this)
53285344
if any(mask_this & mask_that):

0 commit comments

Comments
 (0)