Skip to content

Commit 43fca7c

Browse files
authored
Merge pull request #4 from pandas-dev/master
merges upstream
2 parents a1a1cb2 + b7e786e commit 43fca7c

File tree

596 files changed

+21699
-16127
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

596 files changed

+21699
-16127
lines changed

Diff for: .github/CONTRIBUTING.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
Whether you are a novice or experienced software developer, all contributions and suggestions are welcome!
44

5-
Our main contributing guide can be found [in this repo](https://github.com/pandas-dev/pandas/blob/master/doc/source/development/contributing.rst) or [on the website](https://pandas-docs.github.io/pandas-docs-travis/development/contributing.html). If you do not want to read it in its entirety, we will summarize the main ways in which you can contribute and point to relevant sections of that document for further information.
5+
Our main contributing guide can be found [in this repo](https://github.com/pandas-dev/pandas/blob/master/doc/source/development/contributing.rst) or [on the website](https://pandas.pydata.org/docs/dev/development/contributing.html). If you do not want to read it in its entirety, we will summarize the main ways in which you can contribute and point to relevant sections of that document for further information.
66

77
## Getting Started
88

Diff for: .github/ISSUE_TEMPLATE.md

-29
This file was deleted.

Diff for: .github/ISSUE_TEMPLATE/bug_report.md

+39
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,39 @@
1+
---
2+
3+
name: Bug Report
4+
about: Create a bug report to help us improve pandas
5+
title: "BUG:"
6+
labels: "Bug, Needs Triage"
7+
8+
---
9+
10+
- [ ] I have checked that this issue has not already been reported.
11+
12+
- [ ] I have confirmed this bug exists on the latest version of pandas.
13+
14+
- [ ] (optional) I have confirmed this bug exists on the master branch of pandas.
15+
16+
---
17+
18+
**Note**: Please read [this guide](https://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports) detailing how to provide the necessary information for us to reproduce your bug.
19+
20+
#### Code Sample, a copy-pastable example
21+
22+
```python
23+
# Your code here
24+
25+
```
26+
27+
#### Problem description
28+
29+
[this should explain **why** the current behaviour is a problem and why the expected output is a better solution]
30+
31+
#### Expected Output
32+
33+
#### Output of ``pd.show_versions()``
34+
35+
<details>
36+
37+
[paste the output of ``pd.show_versions()`` here leaving a blank line after the details tag]
38+
39+
</details>

Diff for: .github/ISSUE_TEMPLATE/documentation_improvement.md

+22
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
---
2+
3+
name: Documentation Improvement
4+
about: Report wrong or missing documentation
5+
title: "DOC:"
6+
labels: "Docs, Needs Triage"
7+
8+
---
9+
10+
#### Location of the documentation
11+
12+
[this should provide the location of the documentation, e.g. "pandas.read_csv" or the URL of the documentation, e.g. "https://dev.pandas.io/docs/reference/api/pandas.read_csv.html"]
13+
14+
**Note**: You can check the latest versions of the docs on `master` [here](https://pandas.pydata.org/docs/dev/).
15+
16+
#### Documentation problem
17+
18+
[this should provide a description of what documentation you believe needs to be fixed/improved]
19+
20+
#### Suggested fix for documentation
21+
22+
[this should explain the suggested fix and **why** it's better than the existing documentation]

Diff for: .github/ISSUE_TEMPLATE/feature_request.md

+33
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,33 @@
1+
---
2+
3+
name: Feature Request
4+
about: Suggest an idea for pandas
5+
title: "ENH:"
6+
labels: "Enhancement, Needs Triage"
7+
8+
---
9+
10+
#### Is your feature request related to a problem?
11+
12+
[this should provide a description of what the problem is, e.g. "I wish I could use pandas to do [...]"]
13+
14+
#### Describe the solution you'd like
15+
16+
[this should provide a description of the feature request, e.g. "`DataFrame.foo` should get a new parameter `bar` that [...]", try to write a docstring for the desired feature]
17+
18+
#### API breaking implications
19+
20+
[this should provide a description of how this feature will affect the API]
21+
22+
#### Describe alternatives you've considered
23+
24+
[this should provide a description of any alternative solutions or features you've considered]
25+
26+
#### Additional context
27+
28+
[add any other context, code examples, or references to existing implementations about the feature request here]
29+
30+
```python
31+
# Your code here, if applicable
32+
33+
```

Diff for: .github/ISSUE_TEMPLATE/submit_question.md

+24
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
---
2+
3+
name: Submit Question
4+
about: Ask a general question about pandas
5+
title: "QST:"
6+
labels: "Usage Question, Needs Triage"
7+
8+
---
9+
10+
- [ ] I have searched the [[pandas] tag](https://stackoverflow.com/questions/tagged/pandas) on StackOverflow for similar questions.
11+
12+
- [ ] I have asked my usage related question on [StackOverflow](https://stackoverflow.com).
13+
14+
---
15+
16+
#### Question about pandas
17+
18+
**Note**: If you'd still like to submit a question, please read [this guide](
19+
https://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports) detailing how to provide the necessary information for us to reproduce your question.
20+
21+
```python
22+
# Your code here, if applicable
23+
24+
```

Diff for: .github/workflows/ci.yml

+9-23
Original file line numberDiff line numberDiff line change
@@ -125,32 +125,18 @@ jobs:
125125
- name: Check ipython directive errors
126126
run: "! grep -B1 \"^<<<-------------------------------------------------------------------------$\" sphinx.log"
127127

128-
- name: Install Rclone
129-
run: sudo apt install rclone -y
130-
if: github.event_name == 'push'
131-
132-
- name: Set up Rclone
128+
- name: Install ssh key
133129
run: |
134-
CONF=$HOME/.config/rclone/rclone.conf
135-
mkdir -p `dirname $CONF`
136-
echo "[ovh_host]" > $CONF
137-
echo "type = swift" >> $CONF
138-
echo "env_auth = false" >> $CONF
139-
echo "auth_version = 3" >> $CONF
140-
echo "auth = https://auth.cloud.ovh.net/v3/" >> $CONF
141-
echo "endpoint_type = public" >> $CONF
142-
echo "tenant_domain = default" >> $CONF
143-
echo "tenant = 2977553886518025" >> $CONF
144-
echo "domain = default" >> $CONF
145-
echo "user = w4KGs3pmDxpd" >> $CONF
146-
echo "key = ${{ secrets.ovh_object_store_key }}" >> $CONF
147-
echo "region = BHS" >> $CONF
130+
mkdir -m 700 -p ~/.ssh
131+
echo "${{ secrets.server_ssh_key }}" > ~/.ssh/id_rsa
132+
chmod 600 ~/.ssh/id_rsa
133+
echo "${{ secrets.server_ip }} ecdsa-sha2-nistp256 AAAAE2VjZHNhLXNoYTItbmlzdHAyNTYAAAAIbmlzdHAyNTYAAABBBE1Kkopomm7FHG5enATf7SgnpICZ4W2bw+Ho+afqin+w7sMcrsa0je7sbztFAV8YchDkiBKnWTG4cRT+KZgZCaY=" > ~/.ssh/known_hosts
148134
if: github.event_name == 'push'
149135

150-
- name: Sync web with OVH
151-
run: rclone sync --exclude pandas-docs/** web/build ovh_host:prod
136+
- name: Upload web
137+
run: rsync -az --delete --exclude='pandas-docs' --exclude='docs' --exclude='Pandas_Cheat_Sheet*' web/build/ docs@${{ secrets.server_ip }}:/usr/share/nginx/pandas
152138
if: github.event_name == 'push'
153139

154-
- name: Sync dev docs with OVH
155-
run: rclone sync doc/build/html ovh_host:prod/pandas-docs/dev
140+
- name: Upload dev docs
141+
run: rsync -az --delete doc/build/html/ docs@${{ secrets.server_ip }}:/usr/share/nginx/pandas/pandas-docs/dev
156142
if: github.event_name == 'push'

Diff for: LICENSES/HAVEN_LICENSE

+21-2
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,21 @@
1-
YEAR: 2013-2016
2-
COPYRIGHT HOLDER: Hadley Wickham; RStudio; and Evan Miller
1+
# MIT License
2+
3+
Copyright (c) 2019 Hadley Wickham; RStudio; and Evan Miller
4+
5+
Permission is hereby granted, free of charge, to any person obtaining a copy
6+
of this software and associated documentation files (the "Software"), to deal
7+
in the Software without restriction, including without limitation the rights
8+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9+
copies of the Software, and to permit persons to whom the Software is
10+
furnished to do so, subject to the following conditions:
11+
12+
The above copyright notice and this permission notice shall be included in all
13+
copies or substantial portions of the Software.
14+
15+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21+
SOFTWARE.

Diff for: README.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -158,7 +158,7 @@ Most development discussion is taking place on github in this repo. Further, the
158158

159159
All contributions, bug reports, bug fixes, documentation improvements, enhancements and ideas are welcome.
160160

161-
A detailed overview on how to contribute can be found in the **[contributing guide](https://dev.pandas.io/docs/contributing.html)**. There is also an [overview](.github/CONTRIBUTING.md) on GitHub.
161+
A detailed overview on how to contribute can be found in the **[contributing guide](https://pandas.pydata.org/docs/dev/development/contributing.html)**. There is also an [overview](.github/CONTRIBUTING.md) on GitHub.
162162

163163
If you are simply looking to start working with the pandas codebase, navigate to the [GitHub "issues" tab](https://github.com/pandas-dev/pandas/issues) and start looking through interesting issues. There are a number of issues listed under [Docs](https://github.com/pandas-dev/pandas/issues?labels=Docs&sort=updated&state=open) and [good first issue](https://github.com/pandas-dev/pandas/issues?labels=good+first+issue&sort=updated&state=open) where you could start out.
164164

Diff for: asv_bench/benchmarks/arithmetic.py

+17
Original file line numberDiff line numberDiff line change
@@ -50,6 +50,23 @@ def time_frame_op_with_scalar(self, dtype, scalar, op):
5050
op(self.df, scalar)
5151

5252

53+
class OpWithFillValue:
54+
def setup(self):
55+
# GH#31300
56+
arr = np.arange(10 ** 6)
57+
df = DataFrame({"A": arr})
58+
ser = df["A"]
59+
60+
self.df = df
61+
self.ser = ser
62+
63+
def time_frame_op_with_fill_value_no_nas(self):
64+
self.df.add(self.df, fill_value=4)
65+
66+
def time_series_op_with_fill_value_no_nas(self):
67+
self.ser.add(self.ser, fill_value=4)
68+
69+
5370
class MixedFrameWithSeriesAxis0:
5471
params = [
5572
[

Diff for: asv_bench/benchmarks/finalize.py

+16
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
import pandas as pd
2+
3+
4+
class Finalize:
5+
param_names = ["series", "frame"]
6+
params = [pd.Series, pd.DataFrame]
7+
8+
def setup(self, param):
9+
N = 1000
10+
obj = param(dtype=float)
11+
for i in range(N):
12+
obj.attrs[i] = i
13+
self.obj = obj
14+
15+
def time_finalize_micro(self, param):
16+
self.obj.__finalize__(self.obj, method="__finalize__")

Diff for: asv_bench/benchmarks/frame_ctor.py

+45
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,6 @@
11
import numpy as np
22

3+
import pandas as pd
34
from pandas import DataFrame, MultiIndex, Series, Timestamp, date_range
45

56
from .pandas_vb_common import tm
@@ -118,4 +119,48 @@ def time_frame_from_range(self):
118119
self.df = DataFrame(self.data)
119120

120121

122+
class FromArrays:
123+
124+
goal_time = 0.2
125+
126+
def setup(self):
127+
N_rows = 1000
128+
N_cols = 1000
129+
self.float_arrays = [np.random.randn(N_rows) for _ in range(N_cols)]
130+
self.sparse_arrays = [
131+
pd.arrays.SparseArray(np.random.randint(0, 2, N_rows), dtype="float64")
132+
for _ in range(N_cols)
133+
]
134+
self.int_arrays = [
135+
pd.array(np.random.randint(1000, size=N_rows), dtype="Int64")
136+
for _ in range(N_cols)
137+
]
138+
self.index = pd.Index(range(N_rows))
139+
self.columns = pd.Index(range(N_cols))
140+
141+
def time_frame_from_arrays_float(self):
142+
self.df = DataFrame._from_arrays(
143+
self.float_arrays,
144+
index=self.index,
145+
columns=self.columns,
146+
verify_integrity=False,
147+
)
148+
149+
def time_frame_from_arrays_int(self):
150+
self.df = DataFrame._from_arrays(
151+
self.int_arrays,
152+
index=self.index,
153+
columns=self.columns,
154+
verify_integrity=False,
155+
)
156+
157+
def time_frame_from_arrays_sparse(self):
158+
self.df = DataFrame._from_arrays(
159+
self.sparse_arrays,
160+
index=self.index,
161+
columns=self.columns,
162+
verify_integrity=False,
163+
)
164+
165+
121166
from .pandas_vb_common import setup # noqa: F401 isort:skip

Diff for: asv_bench/benchmarks/frame_methods.py

+13
Original file line numberDiff line numberDiff line change
@@ -619,4 +619,17 @@ def time_select_dtypes(self, n):
619619
self.df.select_dtypes(include="int")
620620

621621

622+
class MemoryUsage:
623+
def setup(self):
624+
self.df = DataFrame(np.random.randn(100000, 2), columns=list("AB"))
625+
self.df2 = self.df.copy()
626+
self.df2["A"] = self.df2["A"].astype("object")
627+
628+
def time_memory_usage(self):
629+
self.df.memory_usage(deep=True)
630+
631+
def time_memory_usage_object_dtype(self):
632+
self.df2.memory_usage(deep=True)
633+
634+
622635
from .pandas_vb_common import setup # noqa: F401 isort:skip

Diff for: asv_bench/benchmarks/indexing.py

+25
Original file line numberDiff line numberDiff line change
@@ -308,6 +308,31 @@ def time_frame_getitem_single_column_int(self):
308308
self.df_int_col[0]
309309

310310

311+
class IndexSingleRow:
312+
params = [True, False]
313+
param_names = ["unique_cols"]
314+
315+
def setup(self, unique_cols):
316+
arr = np.arange(10 ** 7).reshape(-1, 10)
317+
df = DataFrame(arr)
318+
dtypes = ["u1", "u2", "u4", "u8", "i1", "i2", "i4", "i8", "f8", "f4"]
319+
for i, d in enumerate(dtypes):
320+
df[i] = df[i].astype(d)
321+
322+
if not unique_cols:
323+
# GH#33032 single-row lookups with non-unique columns were
324+
# 15x slower than with unique columns
325+
df.columns = ["A", "A"] + list(df.columns[2:])
326+
327+
self.df = df
328+
329+
def time_iloc_row(self, unique_cols):
330+
self.df.iloc[10000]
331+
332+
def time_loc_row(self, unique_cols):
333+
self.df.loc[10000]
334+
335+
311336
class AssignTimeseriesIndex:
312337
def setup(self):
313338
N = 100000

0 commit comments

Comments
 (0)