Skip to content

Commit 8e2b974

Browse files
committed
Merge branch 'main' into 9586-inconsistent-labeling-sub-daily-super-daily-frequencies
2 parents 6522436 + 32f789f commit 8e2b974

File tree

266 files changed

+4323
-2992
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

266 files changed

+4323
-2992
lines changed

Diff for: .circleci/setup_env.sh

+1-2
Original file line numberDiff line numberDiff line change
@@ -55,8 +55,7 @@ if pip list | grep -q ^pandas; then
5555
fi
5656

5757
echo "Build extensions"
58-
# GH 47305: Parallel build can causes flaky ImportError from pandas/_libs/tslibs
59-
python setup.py build_ext -q -j1
58+
python setup.py build_ext -q -j4
6059

6160
echo "Install pandas"
6261
python -m pip install --no-build-isolation --no-use-pep517 -e .

Diff for: .github/actions/build_pandas/action.yml

+2-4
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,5 @@ runs:
1616
python -m pip install -e . --no-build-isolation --no-use-pep517 --no-index
1717
shell: bash -el {0}
1818
env:
19-
# Cannot use parallel compilation on Windows, see https://github.com/pandas-dev/pandas/issues/30873
20-
# GH 47305: Parallel build causes flaky ImportError: /home/runner/work/pandas/pandas/pandas/_libs/tslibs/timestamps.cpython-38-x86_64-linux-gnu.so: undefined symbol: pandas_datetime_to_datetimestruct
21-
N_JOBS: 1
22-
#N_JOBS: ${{ runner.os == 'Windows' && 1 || 2 }}
19+
# https://docs.github.com/en/actions/using-github-hosted-runners/about-github-hosted-runners#supported-runners-and-hardware-resources
20+
N_JOBS: ${{ runner.os == 'macOS' && 3 || 2 }}

Diff for: .github/actions/setup-conda/action.yml

+1-1
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,7 @@ runs:
3030
environment-name: ${{ inputs.environment-name }}
3131
extra-specs: ${{ inputs.extra-specs }}
3232
channels: conda-forge
33-
channel-priority: ${{ runner.os == 'macOS' && 'flexible' || 'strict' }}
33+
channel-priority: 'strict'
3434
condarc-file: ci/condarc.yml
3535
cache-env: true
3636
cache-downloads: true

Diff for: .github/dependabot.yml

+9
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
version: 2
2+
updates:
3+
- package-ecosystem: github-actions
4+
directory: /
5+
schedule:
6+
interval: weekly
7+
labels:
8+
- "CI"
9+
- "Dependencies"

Diff for: .github/workflows/32-bit-linux.yml

+1-1
Original file line numberDiff line numberDiff line change
@@ -40,7 +40,7 @@ jobs:
4040
python -m pip install --no-deps -U pip wheel 'setuptools<60.0.0' && \
4141
python -m pip install versioneer[toml] && \
4242
python -m pip install cython numpy python-dateutil pytz pytest>=7.0.0 pytest-xdist>=2.2.0 pytest-asyncio>=0.17 hypothesis>=6.34.2 && \
43-
python setup.py build_ext -q -j1 && \
43+
python setup.py build_ext -q -j$(nproc) && \
4444
python -m pip install --no-build-isolation --no-use-pep517 -e . && \
4545
python -m pip list && \
4646
export PANDAS_CI=1 && \

Diff for: .github/workflows/code-checks.yml

+2-2
Original file line numberDiff line numberDiff line change
@@ -35,7 +35,7 @@ jobs:
3535
python-version: '3.9'
3636

3737
- name: Run pre-commit
38-
uses: pre-commit/action@v2.0.3
38+
uses: pre-commit/action@v3.0.0
3939
with:
4040
extra_args: --verbose --all-files
4141

@@ -93,7 +93,7 @@ jobs:
9393
if: ${{ steps.build.outcome == 'success' && always() }}
9494

9595
- name: Typing + pylint
96-
uses: pre-commit/action@v2.0.3
96+
uses: pre-commit/action@v3.0.0
9797
with:
9898
extra_args: --verbose --hook-stage manual --all-files
9999
if: ${{ steps.build.outcome == 'success' && always() }}

Diff for: .github/workflows/python-dev.yml

+1-2
Original file line numberDiff line numberDiff line change
@@ -82,10 +82,9 @@ jobs:
8282
python -m pip install python-dateutil pytz cython hypothesis>=6.34.2 pytest>=7.0.0 pytest-xdist>=2.2.0 pytest-cov pytest-asyncio>=0.17
8383
python -m pip list
8484
85-
# GH 47305: Parallel build can cause flaky ImportError from pandas/_libs/tslibs
8685
- name: Build Pandas
8786
run: |
88-
python setup.py build_ext -q -j1
87+
python setup.py build_ext -q -j4
8988
python -m pip install -e . --no-build-isolation --no-use-pep517 --no-index
9089
9190
- name: Build Version

Diff for: .github/workflows/stale-pr.yml

+1-1
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@ jobs:
1313
pull-requests: write
1414
runs-on: ubuntu-22.04
1515
steps:
16-
- uses: actions/stale@v4
16+
- uses: actions/stale@v8
1717
with:
1818
repo-token: ${{ secrets.GITHUB_TOKEN }}
1919
stale-pr-message: "This pull request is stale because it has been open for thirty days with no activity. Please [update](https://pandas.pydata.org/pandas-docs/stable/development/contributing.html#updating-your-pull-request) and respond to this comment if you're still interested in working on this."

Diff for: .github/workflows/ubuntu.yml

+19-6
Original file line numberDiff line numberDiff line change
@@ -26,9 +26,9 @@ jobs:
2626
strategy:
2727
matrix:
2828
env_file: [actions-38.yaml, actions-39.yaml, actions-310.yaml, actions-311.yaml]
29-
pattern: ["not single_cpu", "single_cpu"]
29+
# Prevent the include jobs from overriding other jobs
30+
pattern: [""]
3031
pyarrow_version: ["8", "9", "10"]
31-
pandas_ci: [1]
3232
include:
3333
- name: "Downstream Compat"
3434
env_file: actions-38-downstream_compat.yaml
@@ -75,7 +75,7 @@ jobs:
7575
test_args: "-W error::DeprecationWarning -W error::FutureWarning"
7676
# TODO(cython3): Re-enable once next-beta(after beta 1) comes out
7777
# There are some warnings failing the build with -werror
78-
pandas_ci: 0
78+
pandas_ci: "0"
7979
exclude:
8080
- env_file: actions-38.yaml
8181
pyarrow_version: "8"
@@ -99,9 +99,9 @@ jobs:
9999
LC_ALL: ${{ matrix.lc_all || '' }}
100100
PANDAS_DATA_MANAGER: ${{ matrix.pandas_data_manager || 'block' }}
101101
PANDAS_COPY_ON_WRITE: ${{ matrix.pandas_copy_on_write || '0' }}
102-
PANDAS_CI: ${{ matrix.pandas_ci }}
102+
PANDAS_CI: ${{ matrix.pandas_ci || '1' }}
103103
TEST_ARGS: ${{ matrix.test_args || '' }}
104-
PYTEST_WORKERS: ${{ contains(matrix.pattern, 'not single_cpu') && 'auto' || '1' }}
104+
PYTEST_WORKERS: 'auto'
105105
PYTEST_TARGET: ${{ matrix.pytest_target || 'pandas' }}
106106
IS_PYPY: ${{ contains(matrix.env_file, 'pypy') }}
107107
# TODO: re-enable coverage on pypy, its slow
@@ -170,9 +170,22 @@ jobs:
170170
pyarrow-version: ${{ matrix.pyarrow_version }}
171171

172172
- name: Build Pandas
173+
id: build
173174
uses: ./.github/actions/build_pandas
174175

175-
- name: Test
176+
- name: Test (not single_cpu)
176177
uses: ./.github/actions/run-tests
177178
# TODO: Don't continue on error for PyPy
178179
continue-on-error: ${{ env.IS_PYPY == 'true' }}
180+
env:
181+
# Set pattern to not single_cpu if not already set
182+
PATTERN: ${{ env.PATTERN == '' && 'not single_cpu' || matrix.pattern }}
183+
184+
- name: Test (single_cpu)
185+
uses: ./.github/actions/run-tests
186+
# TODO: Don't continue on error for PyPy
187+
continue-on-error: ${{ env.IS_PYPY == 'true' }}
188+
env:
189+
PATTERN: 'single_cpu'
190+
PYTEST_WORKERS: 1
191+
if: ${{ matrix.pattern == '' && (always() && steps.build.outcome == 'success')}}

Diff for: .gitignore

+3
Original file line numberDiff line numberDiff line change
@@ -53,6 +53,9 @@ dist
5353
# type checkers
5454
pandas/py.typed
5555

56+
# pyenv
57+
.python-version
58+
5659
# tox testing tool
5760
.tox
5861
# rope

Diff for: .pre-commit-config.yaml

+3-11
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,7 @@ repos:
2828
types_or: [python, pyi]
2929
additional_dependencies: [black==23.1.0]
3030
- repo: https://github.com/charliermarsh/ruff-pre-commit
31-
rev: v0.0.255
31+
rev: v0.0.259
3232
hooks:
3333
- id: ruff
3434
args: [--exit-non-zero-on-fix]
@@ -392,14 +392,6 @@ repos:
392392
files: ^pandas/
393393
exclude: ^(pandas/_libs/|pandas/tests/|pandas/errors/__init__.py$|pandas/_version.py)
394394
types: [python]
395-
- id: flake8-pyi
396-
name: flake8-pyi
397-
entry: flake8 --extend-ignore=E301,E302,E305,E701,E704
398-
types: [pyi]
399-
language: python
400-
additional_dependencies:
401-
- flake8==5.0.4
402-
- flake8-pyi==22.8.1
403395
- id: future-annotations
404396
name: import annotations from __future__
405397
entry: 'from __future__ import annotations'
@@ -421,8 +413,8 @@ repos:
421413
language: python
422414
stages: [manual]
423415
additional_dependencies:
424-
- autotyping==22.9.0
425-
- libcst==0.4.7
416+
- autotyping==23.3.0
417+
- libcst==0.4.9
426418
- id: check-test-naming
427419
name: check that test names start with 'test'
428420
entry: python -m scripts.check_test_naming

Diff for: MANIFEST.in

+2
Original file line numberDiff line numberDiff line change
@@ -58,3 +58,5 @@ prune pandas/tests/io/parser/data
5858
# Selectively re-add *.cxx files that were excluded above
5959
graft pandas/_libs/src
6060
graft pandas/_libs/tslibs/src
61+
include pandas/_libs/pd_parser.h
62+
include pandas/_libs/pd_parser.c

Diff for: asv_bench/benchmarks/arithmetic.py

+4
Original file line numberDiff line numberDiff line change
@@ -266,10 +266,14 @@ def setup(self, tz):
266266
self.ts = self.s[halfway]
267267

268268
self.s2 = Series(date_range("20010101", periods=N, freq="s", tz=tz))
269+
self.ts_different_reso = Timestamp("2001-01-02", tz=tz)
269270

270271
def time_series_timestamp_compare(self, tz):
271272
self.s <= self.ts
272273

274+
def time_series_timestamp_different_reso_compare(self, tz):
275+
self.s <= self.ts_different_reso
276+
273277
def time_timestamp_series_compare(self, tz):
274278
self.ts >= self.s
275279

Diff for: asv_bench/benchmarks/strings.py

-7
Original file line numberDiff line numberDiff line change
@@ -34,7 +34,6 @@ def setup(self, dtype):
3434

3535
# GH37371. Testing construction of string series/frames from ExtensionArrays
3636
self.series_cat_arr = Categorical(self.series_arr)
37-
self.frame_cat_arr = Categorical(self.frame_arr)
3837

3938
def time_series_construction(self, dtype):
4039
Series(self.series_arr, dtype=dtype)
@@ -54,12 +53,6 @@ def time_cat_series_construction(self, dtype):
5453
def peakmem_cat_series_construction(self, dtype):
5554
Series(self.series_cat_arr, dtype=dtype)
5655

57-
def time_cat_frame_construction(self, dtype):
58-
DataFrame(self.frame_cat_arr, dtype=dtype)
59-
60-
def peakmem_cat_frame_construction(self, dtype):
61-
DataFrame(self.frame_cat_arr, dtype=dtype)
62-
6356

6457
class Methods(Dtypes):
6558
def time_center(self, dtype):

Diff for: ci/code_checks.sh

-3
Original file line numberDiff line numberDiff line change
@@ -86,8 +86,6 @@ if [[ -z "$CHECK" || "$CHECK" == "docstrings" ]]; then
8686
MSG='Partially validate docstrings (EX01)' ; echo $MSG
8787
$BASE_DIR/scripts/validate_docstrings.py --format=actions --errors=EX01 --ignore_functions \
8888
pandas.Series.index \
89-
pandas.Series.hasnans \
90-
pandas.Series.to_list \
9189
pandas.Series.__iter__ \
9290
pandas.Series.keys \
9391
pandas.Series.item \
@@ -309,7 +307,6 @@ if [[ -z "$CHECK" || "$CHECK" == "docstrings" ]]; then
309307
pandas_object \
310308
pandas.api.interchange.from_dataframe \
311309
pandas.Index.values \
312-
pandas.Index.hasnans \
313310
pandas.Index.dtype \
314311
pandas.Index.inferred_type \
315312
pandas.Index.shape \

Diff for: doc/source/_static/reshaping_pivot.png

5.17 KB
Loading

Diff for: doc/source/development/contributing_codebase.rst

+2-1
Original file line numberDiff line numberDiff line change
@@ -812,7 +812,8 @@ install pandas) by typing::
812812
your installation is probably fine and you can start contributing!
813813

814814
Often it is worth running only a subset of tests first around your changes before running the
815-
entire suite.
815+
entire suite (tip: you can use the [pandas-coverage app](https://pandas-coverage.herokuapp.com/)
816+
to find out which tests hit the lines of code you've modified, and then run only those).
816817

817818
The easiest way to do this is with::
818819

Diff for: doc/source/development/internals.rst

+3-25
Original file line numberDiff line numberDiff line change
@@ -31,31 +31,9 @@ There are functions that make the creation of a regular index easy:
3131
* :func:`period_range`: fixed frequency date range generated from a time rule or
3232
DateOffset. An ndarray of :class:`Period` objects, representing timespans
3333

34-
The motivation for having an ``Index`` class in the first place was to enable
35-
different implementations of indexing. This means that it's possible for you,
36-
the user, to implement a custom ``Index`` subclass that may be better suited to
37-
a particular application than the ones provided in pandas.
38-
39-
From an internal implementation point of view, the relevant methods that an
40-
``Index`` must define are one or more of the following (depending on how
41-
incompatible the new object internals are with the ``Index`` functions):
42-
43-
* :meth:`~Index.get_loc`: returns an "indexer" (an integer, or in some cases a
44-
slice object) for a label
45-
* :meth:`~Index.slice_locs`: returns the "range" to slice between two labels
46-
* :meth:`~Index.get_indexer`: Computes the indexing vector for reindexing / data
47-
alignment purposes. See the source / docstrings for more on this
48-
* :meth:`~Index.get_indexer_non_unique`: Computes the indexing vector for reindexing / data
49-
alignment purposes when the index is non-unique. See the source / docstrings
50-
for more on this
51-
* :meth:`~Index.reindex`: Does any pre-conversion of the input index then calls
52-
``get_indexer``
53-
* :meth:`~Index.union`, :meth:`~Index.intersection`: computes the union or intersection of two
54-
Index objects
55-
* :meth:`~Index.insert`: Inserts a new label into an Index, yielding a new object
56-
* :meth:`~Index.delete`: Delete a label, yielding a new object
57-
* :meth:`~Index.drop`: Deletes a set of labels
58-
* :meth:`~Index.take`: Analogous to ndarray.take
34+
.. warning::
35+
36+
Custom :class:`Index` subclasses are not supported, custom behavior should be implemented using the :class:`ExtensionArray` interface instead.
5937

6038
MultiIndex
6139
~~~~~~~~~~

Diff for: doc/source/getting_started/index.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -533,7 +533,7 @@ Data sets do not only contain numerical data. pandas provides a wide range of fu
533533
Coming from...
534534
--------------
535535

536-
Are you familiar with other software for manipulating tablular data? Learn
536+
Are you familiar with other software for manipulating tabular data? Learn
537537
the pandas-equivalent operations compared to software you already know:
538538

539539
.. panels::

Diff for: doc/source/getting_started/tutorials.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -113,7 +113,7 @@ Various tutorials
113113
* `Wes McKinney's (pandas BDFL) blog <https://wesmckinney.com/archives.html>`_
114114
* `Statistical analysis made easy in Python with SciPy and pandas DataFrames, by Randal Olson <http://www.randalolson.com/2012/08/06/statistical-analysis-made-easy-in-python/>`_
115115
* `Statistical Data Analysis in Python, tutorial videos, by Christopher Fonnesbeck from SciPy 2013 <https://conference.scipy.org/scipy2013/tutorial_detail.php?id=109>`_
116-
* `Financial analysis in Python, by Thomas Wiecki <https://nbviewer.ipython.org/github/twiecki/financial-analysis-python-tutorial/blob/master/1.%20Pandas%20Basics.ipynb>`_
116+
* `Financial analysis in Python, by Thomas Wiecki <https://nbviewer.org/github/twiecki/financial-analysis-python-tutorial/blob/master/1.%20Pandas%20Basics.ipynb>`_
117117
* `Intro to pandas data structures, by Greg Reda <http://www.gregreda.com/2013/10/26/intro-to-pandas-data-structures/>`_
118118
* `Pandas and Python: Top 10, by Manish Amde <https://manishamde.github.io/blog/2013/03/07/pandas-and-python-top-10/>`_
119119
* `Pandas DataFrames Tutorial, by Karlijn Willems <https://www.datacamp.com/community/tutorials/pandas-tutorial-dataframe-python>`_

Diff for: doc/source/reference/arrays.rst

+4-3
Original file line numberDiff line numberDiff line change
@@ -93,9 +93,10 @@ PyArrow type pandas extension type NumPy
9393

9494
.. note::
9595

96-
For string types (``pyarrow.string()``, ``string[pyarrow]``), PyArrow support is still facilitated
97-
by :class:`arrays.ArrowStringArray` and ``StringDtype("pyarrow")``. See the :ref:`string section <api.arrays.string>`
98-
below.
96+
Pyarrow-backed string support is provided by both ``pd.StringDtype("pyarrow")`` and ``pd.ArrowDtype(pa.string())``.
97+
``pd.StringDtype("pyarrow")`` is described below in the :ref:`string section <api.arrays.string>`
98+
and will be returned if the string alias ``"string[pyarrow]"`` is specified. ``pd.ArrowDtype(pa.string())``
99+
generally has better interoperability with :class:`ArrowDtype` of different types.
99100

100101
While individual values in an :class:`arrays.ArrowExtensionArray` are stored as a PyArrow objects, scalars are **returned**
101102
as Python scalars corresponding to the data type, e.g. a PyArrow int64 will be returned as Python int, or :class:`NA` for missing

Diff for: doc/source/user_guide/advanced.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -322,7 +322,7 @@ As usual, **both sides** of the slicers are included as this is label indexing.
322322
.. warning::
323323

324324
You should specify all axes in the ``.loc`` specifier, meaning the indexer for the **index** and
325-
for the **columns**. There are some ambiguous cases where the passed indexer could be mis-interpreted
325+
for the **columns**. There are some ambiguous cases where the passed indexer could be misinterpreted
326326
  as indexing *both* axes, rather than into say the ``MultiIndex`` for the rows.
327327

328328
You should do this:

0 commit comments

Comments
 (0)