Skip to content

BUG: DatetimeIndex.to_period().to_timestamp() forgets freq value #38885

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
2 of 3 tasks
aiwalter opened this issue Jan 1, 2021 · 9 comments
Open
2 of 3 tasks

BUG: DatetimeIndex.to_period().to_timestamp() forgets freq value #38885

aiwalter opened this issue Jan 1, 2021 · 9 comments
Labels
Bug Datetime Datetime data dtype freq retention User expects "freq" attribute to be preserved Frequency DateOffsets Period Period data type

Comments

@aiwalter
Copy link

aiwalter commented Jan 1, 2021

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • (optional) I have confirmed this bug exists on the master branch of pandas.


Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.

Code Sample, a copy-pastable example

>>> pd.date_range(start='1/1/2018 00:00:00', end='2/1/2018 00:01:00', freq="MS")
DatetimeIndex(['2018-01-01', '2018-02-01'], dtype='datetime64[ns]', freq='MS')

>>> pd.date_range(start='1/1/2018 00:00:00', end='2/1/2018 00:01:00', freq="MS").to_period().to_timestamp()
DatetimeIndex(['2018-01-01', '2018-02-01'], dtype='datetime64[ns]', freq=None)

Problem description

Why is freq=None here in the end? Is it possible to make conversion between DatetimeIndex and PeriodIndex more reliable in any way? How can I know which freq values are not able to be converted from PeriodIndex into DatetimeIndex?

Expected Output:

I would expect to have the same freq value after doing .to_period().to_timestamp().

>>> pd.date_range(start='1/1/2018 00:00:00', end='2/1/2018 00:01:00', freq="MS").to_period().to_timestamp()
DatetimeIndex(['2018-01-01', '2018-02-01'], dtype='datetime64[ns]', freq="MS")

Output of pd.show_versions()

<details>

INSTALLED VERSIONS
------------------
commit           : 3e89b4c4b1580aa890023fc550774e63d499da25
python           : 3.7.9.final.0
python-bits      : 64
OS               : Windows
OS-release       : 10
Version          : 10.0.18362
machine          : AMD64
processor        : AMD64 Family 21 Model 112 Stepping 0, AuthenticAMD
byteorder        : little
LC_ALL           : None
LANG             : None
LOCALE           : None.None

pandas           : 1.2.0
numpy            : 1.19.4
pytz             : 2019.3
dateutil         : 2.8.1
pip              : 20.3.1
setuptools       : 51.0.0.post20201207
Cython           : 0.29.17
pytest           : 6.1.2
hypothesis       : None
sphinx           : None
blosc            : None
feather          : None
xlsxwriter       : None
lxml.etree       : None
html5lib         : None
pymysql          : None
psycopg2         : None
jinja2           : 2.11.2
IPython          : 7.19.0
pandas_datareader: None
bs4              : None
bottleneck       : None
fsspec           : 0.8.4
fastparquet      : None
gcsfs            : None
matplotlib       : 3.3.3
numexpr          : None
odfpy            : None
openpyxl         : None
pandas_gbq       : None
pyarrow          : None
pyxlsb           : None
s3fs             : None
scipy            : 1.5.4
sqlalchemy       : None
tables           : None
tabulate         : None
xarray           : None
xlrd             : None
xlwt             : None
numba            : 0.50.1
</details>
@aiwalter aiwalter added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Jan 1, 2021
@jbrockmendel
Copy link
Member

long story short: no. what you can do is something like

dti = pd.date_range(start='1/1/2018 00:00:00', end='2/1/2018 00:01:00', freq="MS")
pi = dti.to_period()
roundtrip = pi.to_timestamp()
roundtrip = pd.DatetimeIndex(roundtrip, freq="infer")

@aiwalter
Copy link
Author

aiwalter commented Jan 1, 2021

@jbrockmendel there is also an issue with freq="MS" here, when converting Timestamp:

idx = pd.date_range(start='1/1/2018 00:00:00', end='1/5/2018 00:01:00', freq="MS")
idx[-1].to_period()

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
pandas\_libs\tslibs\period.pyx in pandas._libs.tslibs.period.freq_to_dtype_code()

AttributeError: 'pandas._libs.tslibs.offsets.MonthBegin' object has no attribute '_period_dtype_code'

The above exception was the direct cause of the following exception:

ValueError                                Traceback (most recent call last)
<ipython-input-16-779b680b91e8> in <module>
      1 idx = pd.date_range(start='1/1/2018 00:00:00', end='1/5/2018 00:01:00', freq="MS")
----> 2 idx[-1].to_period()

pandas\_libs\tslibs\timestamps.pyx in pandas._libs.tslibs.timestamps._Timestamp.to_period()

pandas\_libs\tslibs\period.pyx in pandas._libs.tslibs.period.Period.__new__()

pandas\_libs\tslibs\period.pyx in pandas._libs.tslibs.period.freq_to_dtype_code()

ValueError: Invalid frequency: <MonthBegin>

This works for some freq values without error. I mean this should at least be possible, no? Because I can convert the whole index without error. I did a workaround to solve this issue, however I think this is really a bug.

Workaround:

idx = pd.date_range(start='1/1/2018 00:00:00', end='1/5/2018 00:01:00', freq="MS")
cutoff = idx[-1]
date = pd.DatetimeIndex([cutoff], freq=cutoff.freq)
cutoff = date.to_period()[0]
cutoff

Results in: Period('2018-01', 'M')

@aiwalter
Copy link
Author

aiwalter commented Jan 1, 2021

@jbrockmendel Your solution is still resulting in freq=None, but I would expect freq="MS. So this is only working with index length >= 3, right?

dti = pd.date_range(start='1/1/2018 00:00:00', end='3/1/2018 00:01:00', freq="MS")
pi = dti.to_period()
roundtrip = pi.to_timestamp()
roundtrip = pd.DatetimeIndex(roundtrip, freq="infer")
roundtrip

DatetimeIndex(['2018-01-01', '2018-02-01', '2018-03-01'], dtype='datetime64[ns]', freq='MS')

@simonjayhawkins simonjayhawkins added Period Period data type Datetime Datetime data dtype Frequency DateOffsets and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Jan 2, 2021
@jbrockmendel
Copy link
Member

So this is only working with index length >= 3, right?

Yes. I think with length<3 you'll need to specify freq="MS" (or freq=dti.freq) to get it back

AttributeError: 'pandas._libs.tslibs.offsets.MonthBegin' object has no attribute '_period_dtype_code'

That should be caught and re-raised with a more helpful exception+message. PR would be wlecome.

@aiwalter
Copy link
Author

aiwalter commented Jan 3, 2021

@jbrockmendel I raised a separate issue #38914

@MarcoGorelli
Copy link
Member

Thanks @aiwalter for the report

So if #38914 is open for the error reporting from

idx = pd.date_range(start='1/1/2018 00:00:00', end='1/5/2018 00:01:00', freq="MS")
idx[-1].to_period()

is there anything else that needs to be done in this issue?

@aiwalter
Copy link
Author

aiwalter commented Jan 3, 2021

Well I think no, unless we make an enhancement request out of it so that it should be possible to do conversion between DatetimeIndex and PeriodIndex without any information loss. This would require a new freq value in PeriodIndex similar to the freq="MS" from DatetimeIndex. I dont know if this is possible or appreciated? There might be also other freq values with the same problem.

@mroeschke
Copy link
Member

Since it seems like there isn't any much more movement on this issue to create a new freq value, going to close.

@aiwalter
Copy link
Author

But the issue is not solved, so maybe better to keep it open?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Datetime Datetime data dtype freq retention User expects "freq" attribute to be preserved Frequency DateOffsets Period Period data type
Projects
None yet
Development

No branches or pull requests

5 participants