Skip to content

Commit 0e9be35

Browse files
committed
Merge branch 'master' of https://github.com/pandas-dev/pandas into tana
2 parents eb1c1eb + 0c4113f commit 0e9be35

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

46 files changed

+2090
-704
lines changed

Diff for: .gitignore

+2-2
Original file line numberDiff line numberDiff line change
@@ -101,14 +101,14 @@ asv_bench/pandas/
101101
# Documentation generated files #
102102
#################################
103103
doc/source/generated
104-
doc/source/api/generated
104+
doc/source/user_guide/styled.xlsx
105+
doc/source/reference/api
105106
doc/source/_static
106107
doc/source/vbench
107108
doc/source/vbench.rst
108109
doc/source/index.rst
109110
doc/build/html/index.html
110111
# Windows specific leftover:
111112
doc/tmp.sv
112-
doc/source/styled.xlsx
113113
env/
114114
doc/source/savefig/

Diff for: asv_bench/benchmarks/index_object.py

+2-1
Original file line numberDiff line numberDiff line change
@@ -138,7 +138,8 @@ def setup(self, dtype):
138138
self.sorted = self.idx.sort_values()
139139
half = N // 2
140140
self.non_unique = self.idx[:half].append(self.idx[:half])
141-
self.non_unique_sorted = self.sorted[:half].append(self.sorted[:half])
141+
self.non_unique_sorted = (self.sorted[:half].append(self.sorted[:half])
142+
.sort_values())
142143
self.key = self.sorted[N // 4]
143144

144145
def time_boolean_array(self, dtype):

Diff for: doc/make.py

+3-3
Original file line numberDiff line numberDiff line change
@@ -53,7 +53,7 @@ def __init__(self, num_jobs=0, include_api=True, single_doc=None,
5353
if single_doc and single_doc.endswith('.rst'):
5454
self.single_doc_html = os.path.splitext(single_doc)[0] + '.html'
5555
elif single_doc:
56-
self.single_doc_html = 'api/generated/pandas.{}.html'.format(
56+
self.single_doc_html = 'reference/api/pandas.{}.html'.format(
5757
single_doc)
5858

5959
def _process_single_doc(self, single_doc):
@@ -63,7 +63,7 @@ def _process_single_doc(self, single_doc):
6363
6464
For example, categorial.rst or pandas.DataFrame.head. For the latter,
6565
return the corresponding file path
66-
(e.g. generated/pandas.DataFrame.head.rst).
66+
(e.g. reference/api/pandas.DataFrame.head.rst).
6767
"""
6868
base_name, extension = os.path.splitext(single_doc)
6969
if extension in ('.rst', '.ipynb'):
@@ -258,7 +258,7 @@ def clean():
258258
Clean documentation generated files.
259259
"""
260260
shutil.rmtree(BUILD_PATH, ignore_errors=True)
261-
shutil.rmtree(os.path.join(SOURCE_PATH, 'api', 'generated'),
261+
shutil.rmtree(os.path.join(SOURCE_PATH, 'reference', 'api'),
262262
ignore_errors=True)
263263

264264
def zip_html(self):

Diff for: doc/redirects.csv

+1,540
Large diffs are not rendered by default.

Diff for: doc/source/getting_started/comparison/index.rst

+15
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
{{ header }}
2+
3+
.. _comparison:
4+
5+
===========================
6+
Comparison with other tools
7+
===========================
8+
9+
.. toctree::
10+
:maxdepth: 2
11+
12+
comparison_with_r
13+
comparison_with_sql
14+
comparison_with_sas
15+
comparison_with_stata

Diff for: doc/source/getting_started/index.rst

+1
Original file line numberDiff line numberDiff line change
@@ -13,4 +13,5 @@ Getting started
1313
10min
1414
basics
1515
dsintro
16+
comparison/index
1617
tutorials

Diff for: doc/source/getting_started/overview.rst

+74-19
Original file line numberDiff line numberDiff line change
@@ -6,25 +6,80 @@
66
Package overview
77
****************
88

9-
:mod:`pandas` is an open source, BSD-licensed library providing high-performance,
10-
easy-to-use data structures and data analysis tools for the `Python <https://www.python.org/>`__
11-
programming language.
12-
13-
:mod:`pandas` consists of the following elements:
14-
15-
* A set of labeled array data structures, the primary of which are
16-
Series and DataFrame.
17-
* Index objects enabling both simple axis indexing and multi-level /
18-
hierarchical axis indexing.
19-
* An integrated group by engine for aggregating and transforming data sets.
20-
* Date range generation (date_range) and custom date offsets enabling the
21-
implementation of customized frequencies.
22-
* Input/Output tools: loading tabular data from flat files (CSV, delimited,
23-
Excel 2003), and saving and loading pandas objects from the fast and
24-
efficient PyTables/HDF5 format.
25-
* Memory-efficient "sparse" versions of the standard data structures for storing
26-
data that is mostly missing or mostly constant (some fixed value).
27-
* Moving window statistics (rolling mean, rolling standard deviation, etc.).
9+
**pandas** is a `Python <https://www.python.org>`__ package providing fast,
10+
flexible, and expressive data structures designed to make working with
11+
"relational" or "labeled" data both easy and intuitive. It aims to be the
12+
fundamental high-level building block for doing practical, **real world** data
13+
analysis in Python. Additionally, it has the broader goal of becoming **the
14+
most powerful and flexible open source data analysis / manipulation tool
15+
available in any language**. It is already well on its way toward this goal.
16+
17+
pandas is well suited for many different kinds of data:
18+
19+
- Tabular data with heterogeneously-typed columns, as in an SQL table or
20+
Excel spreadsheet
21+
- Ordered and unordered (not necessarily fixed-frequency) time series data.
22+
- Arbitrary matrix data (homogeneously typed or heterogeneous) with row and
23+
column labels
24+
- Any other form of observational / statistical data sets. The data actually
25+
need not be labeled at all to be placed into a pandas data structure
26+
27+
The two primary data structures of pandas, :class:`Series` (1-dimensional)
28+
and :class:`DataFrame` (2-dimensional), handle the vast majority of typical use
29+
cases in finance, statistics, social science, and many areas of
30+
engineering. For R users, :class:`DataFrame` provides everything that R's
31+
``data.frame`` provides and much more. pandas is built on top of `NumPy
32+
<https://www.numpy.org>`__ and is intended to integrate well within a scientific
33+
computing environment with many other 3rd party libraries.
34+
35+
Here are just a few of the things that pandas does well:
36+
37+
- Easy handling of **missing data** (represented as NaN) in floating point as
38+
well as non-floating point data
39+
- Size mutability: columns can be **inserted and deleted** from DataFrame and
40+
higher dimensional objects
41+
- Automatic and explicit **data alignment**: objects can be explicitly
42+
aligned to a set of labels, or the user can simply ignore the labels and
43+
let `Series`, `DataFrame`, etc. automatically align the data for you in
44+
computations
45+
- Powerful, flexible **group by** functionality to perform
46+
split-apply-combine operations on data sets, for both aggregating and
47+
transforming data
48+
- Make it **easy to convert** ragged, differently-indexed data in other
49+
Python and NumPy data structures into DataFrame objects
50+
- Intelligent label-based **slicing**, **fancy indexing**, and **subsetting**
51+
of large data sets
52+
- Intuitive **merging** and **joining** data sets
53+
- Flexible **reshaping** and pivoting of data sets
54+
- **Hierarchical** labeling of axes (possible to have multiple labels per
55+
tick)
56+
- Robust IO tools for loading data from **flat files** (CSV and delimited),
57+
Excel files, databases, and saving / loading data from the ultrafast **HDF5
58+
format**
59+
- **Time series**-specific functionality: date range generation and frequency
60+
conversion, moving window statistics, moving window linear regressions,
61+
date shifting and lagging, etc.
62+
63+
Many of these principles are here to address the shortcomings frequently
64+
experienced using other languages / scientific research environments. For data
65+
scientists, working with data is typically divided into multiple stages:
66+
munging and cleaning data, analyzing / modeling it, then organizing the results
67+
of the analysis into a form suitable for plotting or tabular display. pandas
68+
is the ideal tool for all of these tasks.
69+
70+
Some other notes
71+
72+
- pandas is **fast**. Many of the low-level algorithmic bits have been
73+
extensively tweaked in `Cython <https://cython.org>`__ code. However, as with
74+
anything else generalization usually sacrifices performance. So if you focus
75+
on one feature for your application you may be able to create a faster
76+
specialized tool.
77+
78+
- pandas is a dependency of `statsmodels
79+
<https://www.statsmodels.org/stable/index.html>`__, making it an important part of the
80+
statistical computing ecosystem in Python.
81+
82+
- pandas has been used extensively in production in financial applications.
2883

2984
Data Structures
3085
---------------

Diff for: doc/source/index.rst.template

+18-107
Original file line numberDiff line numberDiff line change
@@ -1,141 +1,52 @@
11
.. pandas documentation master file, created by
22

3+
.. module:: pandas
4+
35
*********************************************
46
pandas: powerful Python data analysis toolkit
57
*********************************************
68

7-
`PDF Version <pandas.pdf>`__
8-
9-
`Zipped HTML <pandas.zip>`__
10-
11-
.. module:: pandas
12-
139
**Date**: |today| **Version**: |version|
1410

15-
**Binary Installers:** https://pypi.org/project/pandas
16-
17-
**Source Repository:** https://github.com/pandas-dev/pandas
18-
19-
**Issues & Ideas:** https://github.com/pandas-dev/pandas/issues
20-
21-
**Q&A Support:** https://stackoverflow.com/questions/tagged/pandas
22-
23-
**Developer Mailing List:** https://groups.google.com/forum/#!forum/pydata
24-
25-
**pandas** is a `Python <https://www.python.org>`__ package providing fast,
26-
flexible, and expressive data structures designed to make working with
27-
"relational" or "labeled" data both easy and intuitive. It aims to be the
28-
fundamental high-level building block for doing practical, **real world** data
29-
analysis in Python. Additionally, it has the broader goal of becoming **the
30-
most powerful and flexible open source data analysis / manipulation tool
31-
available in any language**. It is already well on its way toward this goal.
32-
33-
pandas is well suited for many different kinds of data:
34-
35-
- Tabular data with heterogeneously-typed columns, as in an SQL table or
36-
Excel spreadsheet
37-
- Ordered and unordered (not necessarily fixed-frequency) time series data.
38-
- Arbitrary matrix data (homogeneously typed or heterogeneous) with row and
39-
column labels
40-
- Any other form of observational / statistical data sets. The data actually
41-
need not be labeled at all to be placed into a pandas data structure
42-
43-
The two primary data structures of pandas, :class:`Series` (1-dimensional)
44-
and :class:`DataFrame` (2-dimensional), handle the vast majority of typical use
45-
cases in finance, statistics, social science, and many areas of
46-
engineering. For R users, :class:`DataFrame` provides everything that R's
47-
``data.frame`` provides and much more. pandas is built on top of `NumPy
48-
<https://www.numpy.org>`__ and is intended to integrate well within a scientific
49-
computing environment with many other 3rd party libraries.
50-
51-
Here are just a few of the things that pandas does well:
52-
53-
- Easy handling of **missing data** (represented as NaN) in floating point as
54-
well as non-floating point data
55-
- Size mutability: columns can be **inserted and deleted** from DataFrame and
56-
higher dimensional objects
57-
- Automatic and explicit **data alignment**: objects can be explicitly
58-
aligned to a set of labels, or the user can simply ignore the labels and
59-
let `Series`, `DataFrame`, etc. automatically align the data for you in
60-
computations
61-
- Powerful, flexible **group by** functionality to perform
62-
split-apply-combine operations on data sets, for both aggregating and
63-
transforming data
64-
- Make it **easy to convert** ragged, differently-indexed data in other
65-
Python and NumPy data structures into DataFrame objects
66-
- Intelligent label-based **slicing**, **fancy indexing**, and **subsetting**
67-
of large data sets
68-
- Intuitive **merging** and **joining** data sets
69-
- Flexible **reshaping** and pivoting of data sets
70-
- **Hierarchical** labeling of axes (possible to have multiple labels per
71-
tick)
72-
- Robust IO tools for loading data from **flat files** (CSV and delimited),
73-
Excel files, databases, and saving / loading data from the ultrafast **HDF5
74-
format**
75-
- **Time series**-specific functionality: date range generation and frequency
76-
conversion, moving window statistics, moving window linear regressions,
77-
date shifting and lagging, etc.
78-
79-
Many of these principles are here to address the shortcomings frequently
80-
experienced using other languages / scientific research environments. For data
81-
scientists, working with data is typically divided into multiple stages:
82-
munging and cleaning data, analyzing / modeling it, then organizing the results
83-
of the analysis into a form suitable for plotting or tabular display. pandas
84-
is the ideal tool for all of these tasks.
85-
86-
Some other notes
87-
88-
- pandas is **fast**. Many of the low-level algorithmic bits have been
89-
extensively tweaked in `Cython <https://cython.org>`__ code. However, as with
90-
anything else generalization usually sacrifices performance. So if you focus
91-
on one feature for your application you may be able to create a faster
92-
specialized tool.
93-
94-
- pandas is a dependency of `statsmodels
95-
<https://www.statsmodels.org/stable/index.html>`__, making it an important part of the
96-
statistical computing ecosystem in Python.
97-
98-
- pandas has been used extensively in production in financial applications.
99-
100-
.. note::
11+
**Download documentation**: `PDF Version <pandas.pdf>`__ | `Zipped HTML <pandas.zip>`__
10112

102-
This documentation assumes general familiarity with NumPy. If you haven't
103-
used NumPy much or at all, do invest some time in `learning about NumPy
104-
<https://docs.scipy.org>`__ first.
13+
**Useful links**:
14+
`Binary Installers <https://pypi.org/project/pandas>`__ |
15+
`Source Repository <https://github.com/pandas-dev/pandas>`__ |
16+
`Issues & Ideas <https://github.com/pandas-dev/pandas/issues>`__ |
17+
`Q&A Support <https://stackoverflow.com/questions/tagged/pandas>`__ |
18+
`Mailing List <https://groups.google.com/forum/#!forum/pydata>`__
10519

106-
See the package overview for more detail about what's in the library.
20+
:mod:`pandas` is an open source, BSD-licensed library providing high-performance,
21+
easy-to-use data structures and data analysis tools for the `Python <https://www.python.org/>`__
22+
programming language.
10723

24+
See the :ref:`overview` for more detail about what's in the library.
10825

10926
{% if single_doc and single_doc.endswith('.rst') -%}
11027
.. toctree::
111-
:maxdepth: 4
28+
:maxdepth: 2
11229

11330
{{ single_doc[:-4] }}
11431
{% elif single_doc %}
11532
.. autosummary::
116-
:toctree: api/generated/
33+
:toctree: reference/api/
11734

11835
{{ single_doc }}
11936
{% else -%}
12037
.. toctree::
121-
:maxdepth: 4
38+
:maxdepth: 2
12239
{% endif %}
12340

12441
{% if not single_doc -%}
125-
What's New <whatsnew/v0.24.0>
42+
What's New in 0.24.0 <whatsnew/v0.24.0>
12643
install
12744
getting_started/index
128-
cookbook
12945
user_guide/index
130-
r_interface
13146
ecosystem
132-
comparison_with_r
133-
comparison_with_sql
134-
comparison_with_sas
135-
comparison_with_stata
13647
{% endif -%}
13748
{% if include_api -%}
138-
api/index
49+
reference/index
13950
{% endif -%}
14051
{% if not single_doc -%}
14152
development/index

0 commit comments

Comments
 (0)