Skip to content

feat: Improve Period parsing #1

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
2 tasks
Herdi2 opened this issue Feb 26, 2025 · 0 comments
Open
2 tasks

feat: Improve Period parsing #1

Herdi2 opened this issue Feb 26, 2025 · 0 comments
Assignees
Labels
feature New feature or request

Comments

@Herdi2
Copy link
Owner

Herdi2 commented Feb 26, 2025

Description

This feature refers to pandas-dev#48000 in the main Pandas repository. It is a comprehensive analysis of which kinds of Period can and cannot be created from a string alone. There are eight categories of dates that cannot be created from string representations alone, these are:

Features to implement

To be implemented

1 Business years not going from Jan - Dec
There needs to be a way to specify these unambiguously.
2 Quarters starting with a different month than January/April/July/October
There needs to be a way to specify these unambiguously.
4 Business days
In the original issue this does not seem to be a use case for these, but this refers to getting all business days in a date range.

Implemented

3 Weeks
There needs to be a way to represent a week using strings. Two ways are originally proposed, either follow the ISO 8601 string standard such as 2022-W31 or represent them using the output of printing a week period str(pd.Period("2017-01-25", freq="W")) => "2017-01-23/2017-01-29".
5 Hours
A problem lifted is that hours are interpreted as minutes due to their string representation, e.g. "2010-11-12 13:00". A proposed solution is to create them in a less ambiguous way such as "2010-11-12 13h". Note that this is shown when running p.freq which gives <Minutes> instead of <Hours> as output.
6 Weeks in the 60s, 70s, 80s or 90s of any century
These lead to DateParseError when trying to recreate them from their string representation.

p = pd.Period( freq='W', year=2272 )
pd.Period(str(p))

The example above does not work, since 2272 is interpreted as 22:72 which is invalid. Therefore, it throws a DateParseError.
7 Weeks from the 24th century onwards
These suffer from the same problem as 6.
8 ISO 8601 ordinal dates
Dates such as "1981-095" should be supported.
9 Support Multi-Year Spans in pandas.Period Parsing
Dates such as "2019-2021" → Represents a multi-year period (Jan 1, 2019 - Dec 31, 2021)
Business Dates such as "2020Q2-2023Q1" → Represents a quarter-based multi-year period

Test cases

Many test cases are given in the original issue pandas-dev#48000, and need to be implemented for each of these features. Other than the ones given in the original issue, other test cases should also be implemented.

Definition of Done

  • Each feature is either implemented or given a valid reason for not being implemented

  • Each feature has more than one unit test attached to it

@Herdi2 Herdi2 added the feature New feature or request label Feb 26, 2025
KWJ222 pushed a commit that referenced this issue Mar 4, 2025
Added `COVERAGE.md` which will detail our work
plan and division of labor.
KWJ222 pushed a commit that referenced this issue Mar 4, 2025
Instructions were added to include how to install
`Coverage.py`
KWJ222 pushed a commit that referenced this issue Mar 4, 2025
This change adds the CCN to each member's
functions, and links to documentation for `lizard`.
KWJ222 pushed a commit that referenced this issue Mar 4, 2025
This change adds the skeleton for the report.
annekh99 added a commit that referenced this issue Mar 6, 2025
Added log containing the main branch results of the test suite located in pandas/tests/tslibs/test_period.py, containing the tests associated with issue #1.
annekh99 added a commit that referenced this issue Mar 6, 2025
Added log containing the testing branch (NB: not including new functionality) results of the test suite located in pandas/tests/tslibs/test_period.py, containing the tests associated with issue #1.
annekh99 added a commit that referenced this issue Mar 6, 2025
Added regression test log containing test results for (almost) the full Pandas test suite prior to resolution of issue #1.
annekh99 added a commit that referenced this issue Mar 7, 2025
Added regression test log containing test results for (almost) the full Pandas test suite after resolution of issue #1.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants