Skip to content

Commit b4bb5b3

Browse files
committed
Merge remote-tracking branch 'upstream/master' into pr/arw2019/to_datetime-inconsistent-parsing
2 parents 07834ed + 8b90070 commit b4bb5b3

File tree

225 files changed

+5867
-4631
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

225 files changed

+5867
-4631
lines changed

.github/ISSUE_TEMPLATE/submit_question.md

-24
This file was deleted.
+43
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,43 @@
1+
name: Submit Question
2+
description: Ask a general question about pandas
3+
title: "QST: "
4+
labels: [Usage Question, Needs Triage]
5+
6+
body:
7+
- type: markdown
8+
attributes:
9+
value: >
10+
Since [StackOverflow](https://stackoverflow.com) is better suited towards answering
11+
usage questions, we ask that all usage questions are first asked on StackOverflow.
12+
- type: checkboxes
13+
attributes:
14+
options:
15+
- label: >
16+
I have searched the [[pandas] tag](https://stackoverflow.com/questions/tagged/pandas)
17+
on StackOverflow for similar questions.
18+
required: true
19+
- label: >
20+
I have asked my usage related question on [StackOverflow](https://stackoverflow.com).
21+
required: true
22+
- type: input
23+
id: question-link
24+
attributes:
25+
label: Link to question on StackOverflow
26+
validations:
27+
required: true
28+
- type: markdown
29+
attributes:
30+
value: ---
31+
- type: textarea
32+
id: question
33+
attributes:
34+
label: Question about pandas
35+
description: >
36+
**Note**: If you'd still like to submit a question, please read [this guide](
37+
https://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports) detailing
38+
how to provide the necessary information for us to reproduce your question.
39+
placeholder: |
40+
```python
41+
# Your code here, if applicable
42+
43+
```

.github/workflows/ci.yml

-8
Original file line numberDiff line numberDiff line change
@@ -32,10 +32,6 @@ jobs:
3232
with:
3333
fetch-depth: 0
3434

35-
- name: Looking for unwanted patterns
36-
run: ci/code_checks.sh patterns
37-
if: always()
38-
3935
- name: Cache conda
4036
uses: actions/cache@v2
4137
with:
@@ -52,10 +48,6 @@ jobs:
5248
- name: Build Pandas
5349
uses: ./.github/actions/build_pandas
5450

55-
- name: Linting
56-
run: ci/code_checks.sh lint
57-
if: always()
58-
5951
- name: Checks on imported code
6052
run: ci/code_checks.sh code
6153
if: always()

.github/workflows/python-dev.yml

+1-1
Original file line numberDiff line numberDiff line change
@@ -41,7 +41,7 @@ jobs:
4141
- name: Install dependencies
4242
run: |
4343
python -m pip install --upgrade pip setuptools wheel
44-
pip install git+https://github.com/numpy/numpy.git
44+
pip install -i https://pypi.anaconda.org/scipy-wheels-nightly/simple numpy
4545
pip install git+https://github.com/pytest-dev/pytest.git
4646
pip install git+https://github.com/nedbat/coveragepy.git
4747
pip install cython python-dateutil pytz hypothesis pytest-xdist pytest-cov

.pre-commit-config.yaml

+36-3
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ repos:
99
- id: absolufy-imports
1010
files: ^pandas/
1111
- repo: https://github.com/python/black
12-
rev: 21.6b0
12+
rev: 21.7b0
1313
hooks:
1414
- id: black
1515
- repo: https://github.com/codespell-project/codespell
@@ -44,6 +44,7 @@ repos:
4444
- flake8-bugbear==21.3.2
4545
- pandas-dev-flaker==0.2.0
4646
- id: flake8
47+
alias: flake8-cython
4748
name: flake8 (cython)
4849
types: [cython]
4950
args: [--append-config=flake8/cython.cfg]
@@ -53,11 +54,11 @@ repos:
5354
types: [text]
5455
args: [--append-config=flake8/cython-template.cfg]
5556
- repo: https://github.com/PyCQA/isort
56-
rev: 5.9.2
57+
rev: 5.9.3
5758
hooks:
5859
- id: isort
5960
- repo: https://github.com/asottile/pyupgrade
60-
rev: v2.21.0
61+
rev: v2.23.3
6162
hooks:
6263
- id: pyupgrade
6364
args: [--py38-plus]
@@ -102,7 +103,34 @@ repos:
102103
# Incorrect code-block / IPython directives
103104
|\.\.\ code-block\ ::
104105
|\.\.\ ipython\ ::
106+
107+
# Check for deprecated messages without sphinx directive
108+
|(DEPRECATED|DEPRECATE|Deprecated)(:|,|\.)
105109
types_or: [python, cython, rst]
110+
- id: cython-casting
111+
name: Check Cython casting is `<type>obj`, not `<type> obj`
112+
language: pygrep
113+
entry: '[a-zA-Z0-9*]> '
114+
files: (\.pyx|\.pxi.in)$
115+
- id: incorrect-backticks
116+
name: Check for backticks incorrectly rendering because of missing spaces
117+
language: pygrep
118+
entry: '[a-zA-Z0-9]\`\`?[a-zA-Z0-9]'
119+
types: [rst]
120+
files: ^doc/source/
121+
- id: seed-check-asv
122+
name: Check for unnecessary random seeds in asv benchmarks
123+
language: pygrep
124+
entry: 'np\.random\.seed'
125+
files: ^asv_bench/benchmarks
126+
exclude: ^asv_bench/benchmarks/pandas_vb_common\.py
127+
- id: invalid-ea-testing
128+
name: Check for invalid EA testing
129+
language: pygrep
130+
entry: 'tm\.assert_(series|frame)_equal'
131+
files: ^pandas/tests/extension/base
132+
types: [python]
133+
exclude: ^pandas/tests/extension/base/base\.py
106134
- id: pip-to-conda
107135
name: Generate pip dependency from conda
108136
description: This hook checks if the conda environment.yml and requirements-dev.txt are equal
@@ -136,3 +164,8 @@ repos:
136164
entry: python scripts/no_bool_in_generic.py
137165
language: python
138166
files: ^pandas/core/generic\.py$
167+
- id: pandas-errors-documented
168+
name: Ensure pandas errors are documented in doc/source/reference/general_utility_functions.rst
169+
entry: python scripts/pandas_errors_documented.py
170+
language: python
171+
files: ^pandas/errors/__init__.py$

asv_bench/asv.conf.json

+1-4
Original file line numberDiff line numberDiff line change
@@ -46,17 +46,14 @@
4646
"numba": [],
4747
"numexpr": [],
4848
"pytables": [null, ""], // platform dependent, see excludes below
49+
"pyarrow": [],
4950
"tables": [null, ""],
5051
"openpyxl": [],
5152
"xlsxwriter": [],
5253
"xlrd": [],
5354
"xlwt": [],
5455
"odfpy": [],
55-
"pytest": [],
5656
"jinja2": [],
57-
// If using Windows with python 2.7 and want to build using the
58-
// mingw toolchain (rather than MSVC), uncomment the following line.
59-
// "libpython": [],
6057
},
6158
"conda_channels": ["defaults", "conda-forge"],
6259
// Combinations of libraries/python versions can be excluded/included

asv_bench/benchmarks/dtypes.py

+3-3
Original file line numberDiff line numberDiff line change
@@ -51,9 +51,9 @@ def time_pandas_dtype_invalid(self, dtype):
5151
class SelectDtypes:
5252

5353
params = [
54-
tm.ALL_INT_DTYPES
55-
+ tm.ALL_EA_INT_DTYPES
56-
+ tm.FLOAT_DTYPES
54+
tm.ALL_INT_NUMPY_DTYPES
55+
+ tm.ALL_INT_EA_DTYPES
56+
+ tm.FLOAT_NUMPY_DTYPES
5757
+ tm.COMPLEX_DTYPES
5858
+ tm.DATETIME64_DTYPES
5959
+ tm.TIMEDELTA64_DTYPES

asv_bench/benchmarks/frame_ctor.py

+8
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,7 @@
22

33
import pandas as pd
44
from pandas import (
5+
Categorical,
56
DataFrame,
67
MultiIndex,
78
Series,
@@ -31,6 +32,9 @@ def setup(self):
3132
self.dict_list = frame.to_dict(orient="records")
3233
self.data2 = {i: {j: float(j) for j in range(100)} for i in range(2000)}
3334

35+
# arrays which we wont consolidate
36+
self.dict_of_categoricals = {i: Categorical(np.arange(N)) for i in range(K)}
37+
3438
def time_list_of_dict(self):
3539
DataFrame(self.dict_list)
3640

@@ -50,6 +54,10 @@ def time_nested_dict_int64(self):
5054
# nested dict, integer indexes, regression described in #621
5155
DataFrame(self.data2)
5256

57+
def time_dict_of_categoricals(self):
58+
# dict of arrays that we wont consolidate
59+
DataFrame(self.dict_of_categoricals)
60+
5361

5462
class FromSeries:
5563
def setup(self):

asv_bench/benchmarks/frame_methods.py

+6-2
Original file line numberDiff line numberDiff line change
@@ -538,8 +538,12 @@ class Interpolate:
538538
def setup(self, downcast):
539539
N = 10000
540540
# this is the worst case, where every column has NaNs.
541-
self.df = DataFrame(np.random.randn(N, 100))
542-
self.df.values[::2] = np.nan
541+
arr = np.random.randn(N, 100)
542+
# NB: we need to set values in array, not in df.values, otherwise
543+
# the benchmark will be misleading for ArrayManager
544+
arr[::2] = np.nan
545+
546+
self.df = DataFrame(arr)
543547

544548
self.df2 = DataFrame(
545549
{

asv_bench/benchmarks/groupby.py

+24-11
Original file line numberDiff line numberDiff line change
@@ -403,7 +403,7 @@ def time_srs_bfill(self):
403403

404404
class GroupByMethods:
405405

406-
param_names = ["dtype", "method", "application"]
406+
param_names = ["dtype", "method", "application", "ncols"]
407407
params = [
408408
["int", "float", "object", "datetime", "uint"],
409409
[
@@ -443,15 +443,23 @@ class GroupByMethods:
443443
"var",
444444
],
445445
["direct", "transformation"],
446+
[1, 2, 5, 10],
446447
]
447448

448-
def setup(self, dtype, method, application):
449+
def setup(self, dtype, method, application, ncols):
449450
if method in method_blocklist.get(dtype, {}):
450451
raise NotImplementedError # skip benchmark
452+
453+
if ncols != 1 and method in ["value_counts", "unique"]:
454+
# DataFrameGroupBy doesn't have these methods
455+
raise NotImplementedError
456+
451457
ngroups = 1000
452458
size = ngroups * 2
453-
rng = np.arange(ngroups)
454-
values = rng.take(np.random.randint(0, ngroups, size=size))
459+
rng = np.arange(ngroups).reshape(-1, 1)
460+
rng = np.broadcast_to(rng, (len(rng), ncols))
461+
taker = np.random.randint(0, ngroups, size=size)
462+
values = rng.take(taker, axis=0)
455463
if dtype == "int":
456464
key = np.random.randint(0, size, size=size)
457465
elif dtype == "uint":
@@ -465,22 +473,27 @@ def setup(self, dtype, method, application):
465473
elif dtype == "datetime":
466474
key = date_range("1/1/2011", periods=size, freq="s")
467475

468-
df = DataFrame({"values": values, "key": key})
476+
cols = [f"values{n}" for n in range(ncols)]
477+
df = DataFrame(values, columns=cols)
478+
df["key"] = key
479+
480+
if len(cols) == 1:
481+
cols = cols[0]
469482

470483
if application == "transform":
471484
if method == "describe":
472485
raise NotImplementedError
473486

474-
self.as_group_method = lambda: df.groupby("key")["values"].transform(method)
475-
self.as_field_method = lambda: df.groupby("values")["key"].transform(method)
487+
self.as_group_method = lambda: df.groupby("key")[cols].transform(method)
488+
self.as_field_method = lambda: df.groupby(cols)["key"].transform(method)
476489
else:
477-
self.as_group_method = getattr(df.groupby("key")["values"], method)
478-
self.as_field_method = getattr(df.groupby("values")["key"], method)
490+
self.as_group_method = getattr(df.groupby("key")[cols], method)
491+
self.as_field_method = getattr(df.groupby(cols)["key"], method)
479492

480-
def time_dtype_as_group(self, dtype, method, application):
493+
def time_dtype_as_group(self, dtype, method, application, ncols):
481494
self.as_group_method()
482495

483-
def time_dtype_as_field(self, dtype, method, application):
496+
def time_dtype_as_field(self, dtype, method, application, ncols):
484497
self.as_field_method()
485498

486499

asv_bench/benchmarks/indexing.py

+9
Original file line numberDiff line numberDiff line change
@@ -366,11 +366,20 @@ class InsertColumns:
366366
def setup(self):
367367
self.N = 10 ** 3
368368
self.df = DataFrame(index=range(self.N))
369+
self.df2 = DataFrame(np.random.randn(self.N, 2))
369370

370371
def time_insert(self):
371372
for i in range(100):
372373
self.df.insert(0, i, np.random.randn(self.N), allow_duplicates=True)
373374

375+
def time_insert_middle(self):
376+
# same as time_insert but inserting to a middle column rather than
377+
# front or back (which have fast-paths)
378+
for i in range(100):
379+
self.df2.insert(
380+
1, "colname", np.random.randn(self.N), allow_duplicates=True
381+
)
382+
374383
def time_assign_with_setitem(self):
375384
for i in range(100):
376385
self.df[i] = np.random.randn(self.N)

asv_bench/benchmarks/io/json.py

+10-6
Original file line numberDiff line numberDiff line change
@@ -172,15 +172,19 @@ def time_to_json(self, orient, frame):
172172
def peakmem_to_json(self, orient, frame):
173173
getattr(self, frame).to_json(self.fname, orient=orient)
174174

175-
def time_to_json_wide(self, orient, frame):
175+
176+
class ToJSONWide(ToJSON):
177+
def setup(self, orient, frame):
178+
super().setup(orient, frame)
176179
base_df = getattr(self, frame).copy()
177-
df = concat([base_df.iloc[:100]] * 1000, ignore_index=True, axis=1)
178-
df.to_json(self.fname, orient=orient)
180+
df_wide = concat([base_df.iloc[:100]] * 1000, ignore_index=True, axis=1)
181+
self.df_wide = df_wide
182+
183+
def time_to_json_wide(self, orient, frame):
184+
self.df_wide.to_json(self.fname, orient=orient)
179185

180186
def peakmem_to_json_wide(self, orient, frame):
181-
base_df = getattr(self, frame).copy()
182-
df = concat([base_df.iloc[:100]] * 1000, ignore_index=True, axis=1)
183-
df.to_json(self.fname, orient=orient)
187+
self.df_wide.to_json(self.fname, orient=orient)
184188

185189

186190
class ToJSONISO(BaseIO):

asv_bench/benchmarks/io/style.py

+2-2
Original file line numberDiff line numberDiff line change
@@ -36,11 +36,11 @@ def peakmem_classes_render(self, cols, rows):
3636

3737
def time_format_render(self, cols, rows):
3838
self._style_format()
39-
self.st.render()
39+
self.st._render_html(True, True)
4040

4141
def peakmem_format_render(self, cols, rows):
4242
self._style_format()
43-
self.st.render()
43+
self.st._render_html(True, True)
4444

4545
def _style_apply(self):
4646
def _apply_func(s):

0 commit comments

Comments
 (0)