Skip to content

ENH: let .dt.isocalendar() return float64 in presence of NaT #54657

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
1 of 3 tasks
kmuehlbauer opened this issue Aug 21, 2023 · 5 comments
Open
1 of 3 tasks

ENH: let .dt.isocalendar() return float64 in presence of NaT #54657

kmuehlbauer opened this issue Aug 21, 2023 · 5 comments
Labels
Datetime Datetime data dtype Dtype Conversions Unexpected or buggy dtype conversions Enhancement Needs Discussion Requires discussion from core team before further action

Comments

@kmuehlbauer
Copy link
Contributor

Feature Type

  • Adding new functionality to pandas

  • Changing existing functionality in pandas

  • Removing existing functionality in pandas

Problem Description

Currently .dt.isocalendar() returns UInt32 with pd.NA in presence of NaT, whereas .dt.year returns float64 with np.nan. We've encountered this discrepancy over at xarray pydata/xarray#7928.

import pandas as pd
s = pd.to_datetime(pd.Series(['2021-12-01', '2021-12-02', '2021-12-03', pd.NaT]))
print("ISOCALENDAR")
print(s.dt.isocalendar().year)
print("YEAR")
print(s.dt.year)
ISOCALENDAR
0    2021
1    2021
2    2021
3    <NA>
Name: year, dtype: UInt32
YEAR
0    2021.0
1    2021.0
2    2021.0
3       NaN
dtype: float64

We could align that at the respective xarray accessor, but it would make more sense to align it here.

Feature Description

One solution would be to use the same functionality present in _field_accessor (maybe_mask_results) to do the conversion to float64 in presence of NaT. Please have a look at the below code.

 def isocalendar(self) -> DataFrame:
      """
      Calculate year, week, and day according to the ISO 8601 standard.

      .. versionadded:: 1.1.0

      Returns
      -------
      DataFrame
          With columns year, week and day.

      See Also
      --------
      Timestamp.isocalendar : Function return a 3-tuple containing ISO year,
          week number, and weekday for the given Timestamp object.
      datetime.date.isocalendar : Return a named tuple object with
          three components: year, week and weekday.

      Examples
      --------
      >>> idx = pd.date_range(start='2019-12-29', freq='D', periods=4)
      >>> idx.isocalendar()
                  year  week  day
      2019-12-29  2019    52    7
      2019-12-30  2020     1    1
      2019-12-31  2020     1    2
      2020-01-01  2020     1    3
      >>> idx.isocalendar().week
      2019-12-29    52
      2019-12-30     1
      2019-12-31     1
      2020-01-01     1
      Freq: D, Name: week, dtype: UInt32
      """
      from pandas import DataFrame

      values = self._local_timestamps()
      sarray = fields.build_isocalendar_sarray(values, reso=self._creso)
      dtype = np.dtype([('year', 'float64'), ('week', 'float64'), ('day', 'float64')])
      sarray = self._maybe_mask_results(
           sarray, fill_value=None, convert=dtype
      )
      dtype = None if sarray.dtype == dtype else "UInt32"
      iso_calendar_df = DataFrame(
          sarray, columns=["year", "week", "day"],  dtype=dtype
      )
      if dtype != sarray.dtype:
          if self._hasna:
              iso_calendar_df.iloc[self._isnan] = None

      return iso_calendar_df

I can move this into a Pull Request, if there is interest. I'll also try to implemented some workaround in xarray until a final solution has settled.

Alternative Solutions

No alternative solutions considered.

Additional Context

No response

@mroeschke
Copy link
Member

Makes sense to align. I think it would actually make sense for dt.year/month/day to return UInt32 to preserve the int-nature of these values

@kmuehlbauer
Copy link
Contributor Author

Thanks @mroeschke for getting to this.

How will NaT be translated in these cases?

@kmuehlbauer
Copy link
Contributor Author

Ok obviously as pd.NA.

@mroeschke
Copy link
Member

Ok obviously as pd.NA.

Yeah that's what I was thinking. Could use another opinion from other core devs

@mroeschke mroeschke added Datetime Datetime data dtype Dtype Conversions Unexpected or buggy dtype conversions Needs Discussion Requires discussion from core team before further action and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Sep 19, 2023
@kmuehlbauer
Copy link
Contributor Author

Still an issue with latest pandas 2.2.3. Is there any news here?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Datetime Datetime data dtype Dtype Conversions Unexpected or buggy dtype conversions Enhancement Needs Discussion Requires discussion from core team before further action
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants