Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug Report: Bug on treating timestamp with timezone info #236

Closed
2young-2simple-sometimes-naive opened this issue Aug 12, 2020 · 12 comments
Closed
Labels
bug Something isn't working released code merged into repo AND released to Pypi

Comments

@2young-2simple-sometimes-naive

Describe the bug
I am plotting a pandas dataframe, with timestamp as index. The plotter plot x-bar as zulu time, instead of my local time
Screenshots
1

Desktop (please complete the following information):

  • OS: Mac 10.15.6
  • Version 0.12.7a0
@2young-2simple-sometimes-naive 2young-2simple-sometimes-naive added the bug Something isn't working label Aug 12, 2020
@DanielGoldfarb
Copy link
Collaborator

DanielGoldfarb commented Aug 12, 2020

The plotter plots whatever time your data has. Please post your data as well as your code that you used to make this plot, and i will take a look. If your data contains timezone information, and if mplfinace is not handling that information correctly, then the only way i will be able to debug it is to have both the data and the code. Thank you.

@2young-2simple-sometimes-naive
Copy link
Author

2young-2simple-sometimes-naive commented Aug 12, 2020

The plotter plots whatever time your data has. Please post your data as well as your code that you used to make this plot, and i will take a look. If your data contains timezone information, and if mplfinace is not handling that information correctly, then the only way i will be able to debug it is to have both the data and the code. Thank you.

df.data.zip
use the following code:

import pickle
import mplfinance as fplt
import pandas as pd
df = pickle.load(open("df.data", "rb"))
print(df)
fplt.plot(df,type='candle',style='charles',title='SPY',ylabel='Price ($)',figscale=2.0)

notice the pandas dataframe index is from 04:00:00 to 19:59:00
the plot x bar from 8:00 to around 23:00

                         open    high     low   close   volume      vwap

timestamp
2020-08-11 04:00:00-04:00 336.93 337.13 336.93 337.05 15800.0 337.0059
2020-08-11 04:01:00-04:00 337.00 337.04 336.99 336.99 3330.0 337.0265
2020-08-11 04:02:00-04:00 336.84 336.85 336.84 336.85 4378.0 336.8499
2020-08-11 04:04:00-04:00 336.84 336.84 336.79 336.79 400.0 336.8150
2020-08-11 04:07:00-04:00 336.88 336.89 336.88 336.88 40800.0 336.8898
... ... ... ... ... ... ...
2020-08-11 19:55:00-04:00 333.65 333.65 333.60 333.60 2200.0 333.6248
2020-08-11 19:56:00-04:00 333.58 333.60 333.56 333.56 9186.0 333.5638
2020-08-11 19:57:00-04:00 333.56 333.56 333.55 333.55 20848.0 333.5510
2020-08-11 19:58:00-04:00 333.55 333.55 333.52 333.52 25120.0 333.5491
2020-08-11 19:59:00-04:00 333.52 333.54 333.35 333.37 47534.0 333.4619

[866 rows x 6 columns]

output

@justinabate
Copy link

A similar finding on my end: pandas timestamps in eastern, mpf.plot x axis in UTC
Ubuntu 18.04.4 LTS

image

image

timestamps were localized from broker data as follows:
cur_time = eastern.localize(datetime.strptime(json_response[i]['time'], "%Y-%m-%dT%H:%M:%S"));
where:
eastern = pytz.timezone('US/Eastern');

and kwargs for the mpf.plot call:

kwargs = dict( block = False, returnfig = True, type = 'candle', style = s, volume = False, figratio = (16,9), datetime_format = '%%m-%d | %H:%M %Z', xrotation = 90, scale_width_adjustment = dict(volume=0.4,candle=1.8) );

@DanielGoldfarb
Copy link
Collaborator

@2young-2simple-sometimes-naive
I am unable to use the dataframe you have provided. I am getting error

'DataFrame' has no attribute '_data'

when I try to print it, And if I try to call .head() on the dataframe I get a too many recursions error.

Perhaps use package dill instead of pickle. It's basically the same as pickle except that pickle does not correctly process certain native python objects. Alternatively you can just save your dataframe as a csv file using Pandas, and then I can read it with Pandas.

@2young-2simple-sometimes-naive
Copy link
Author

Here you go.
spy.zip

@DanielGoldfarb
Copy link
Collaborator

@2young-2simple-sometimes-naive
Thanks. I was able to reproduce anyway using the just few lines printed from your comment above.

It appears that if I plot the data using Pandas, it either ignores the time zone information, or converts it, to local time; but at any rate it displays the times as you are expecting (without adding the 4 hours).

If I plot the data using Matplotlib directly however, it does what mplfinance is doing.

I'm going to play around with this a little more to see what is happening.

@DanielGoldfarb
Copy link
Collaborator

It appears that Pandas itself is converting the times when taking the values from the index:
Notice the difference between df.index and df.index.values

Python 3.8.3 (default, Jul  2 2020, 16:21:59)
Type 'copyright', 'credits' or 'license' for more information
IPython 7.16.1 -- An enhanced Interactive Python. Type '?' for help.

In [1]: import pandas as pd
In [2]: df = pd.read_csv('issue236.df.scratchdata',index_col=0,parse_dates=True)
In [3]: df
Out[3]:
                             Open    High     Low   Close   Volume      vwap
Date
2020-08-11 04:00:00-04:00  336.93  337.13  336.93  337.05  15800.0  337.0059
2020-08-11 04:01:00-04:00  337.00  337.04  336.99  336.99   3330.0  337.0265
2020-08-11 04:02:00-04:00  336.84  336.85  336.84  336.85   4378.0  336.8499
2020-08-11 04:04:00-04:00  336.84  336.84  336.79  336.79    400.0  336.8150
2020-08-11 04:07:00-04:00  336.88  336.89  336.88  336.88  40800.0  336.8898

In [4]: df.index
Out[4]:
DatetimeIndex(['2020-08-11 04:00:00-04:00', '2020-08-11 04:01:00-04:00',
               '2020-08-11 04:02:00-04:00', '2020-08-11 04:04:00-04:00',
               '2020-08-11 04:07:00-04:00'],
              dtype='datetime64[ns, pytz.FixedOffset(-240)]', name='Date', freq=None)

In [5]: df.index.values
Out[5]:
array(['2020-08-11T08:00:00.000000000', '2020-08-11T08:01:00.000000000',
       '2020-08-11T08:02:00.000000000', '2020-08-11T08:04:00.000000000',
       '2020-08-11T08:07:00.000000000'], dtype='datetime64[ns]')

However mplfinance uses Pandas' index.to_pydatetime() followed by Matplotlib's mdates.date2num().
It appears (see below) that mdates.date2num() translates the time zone to utc.

In [6]: df.index.to_pydatetime()
Out[6]:
array([datetime.datetime(2020, 8, 11, 4, 0, tzinfo=pytz.FixedOffset(-240)),
       datetime.datetime(2020, 8, 11, 4, 1, tzinfo=pytz.FixedOffset(-240)),
       datetime.datetime(2020, 8, 11, 4, 2, tzinfo=pytz.FixedOffset(-240)),
       datetime.datetime(2020, 8, 11, 4, 4, tzinfo=pytz.FixedOffset(-240)),
       datetime.datetime(2020, 8, 11, 4, 7, tzinfo=pytz.FixedOffset(-240))],
      dtype=object)

In [7]: import matplotlib.dates  as mdates
In [8]: mdates.date2num(df.index.to_pydatetime())
Out[8]:
array([18485.33333333, 18485.33402778, 18485.33472222, 18485.33611111,
       18485.33819444])

In [9]: mdates.num2date( mdates.date2num(df.index.to_pydatetime()) )
Out[9]:
[datetime.datetime(2020, 8, 11, 8, 0, tzinfo=datetime.timezone.utc),
 datetime.datetime(2020, 8, 11, 8, 1, tzinfo=datetime.timezone.utc),
 datetime.datetime(2020, 8, 11, 8, 2, tzinfo=datetime.timezone.utc),
 datetime.datetime(2020, 8, 11, 8, 4, tzinfo=datetime.timezone.utc),
 datetime.datetime(2020, 8, 11, 8, 7, tzinfo=datetime.timezone.utc)]

@DanielGoldfarb
Copy link
Collaborator

@justinabate
Copy link

My working solution to this:
Plot dataframe using naive local time, via tz_localize(None)

Before:
image

image

After:
image

image

image

@DanielGoldfarb
Copy link
Collaborator

@justinjabate
Thank you for posting the workaround. I was about to post something similar myself. I will fix this in the package, but I've got a bunch of other stuff going on right now and might not get to it for a few weeks.
All the best. --Daniel

@justinabate
Copy link

@DanielGoldfarb thank you for your work on this library

DanielGoldfarb added a commit that referenced this issue Dec 22, 2020
New handling of volume exponent; and assorted scratch_pad additions:

This PR is primarily to implement a new way to handle the volume exponent; the new way avoids calling `.draw()` which in turn eliminates the bug mentioned in issue #296, and also mentioned/described in a previous email from S.G. ([commit 2c88663](2c88663))

This PR also includes a lot of scratch_pad work for investigations of various issues (#282, #236, #241), and for an mplfinance presentation: including comparison of old and new api's, intraday animation with volumes, generation of animation gifs, etc.
@DanielGoldfarb DanielGoldfarb added the released code merged into repo AND released to Pypi label Dec 23, 2020
@DanielGoldfarb
Copy link
Collaborator

This is released now. To get the newest code:
pip install --upgrade mplfinance

If anyone was depending on the old behavior (that used time zone info, if present in the datetime index, to convert to UTC) they can access that old behavior by setting kwarg tz_localize=False when calling mpf.plot().

The new behavior is to ignore tzinfo in the datetime index and always plot according to the local time in the datetime index.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working released code merged into repo AND released to Pypi
Projects
None yet
Development

No branches or pull requests

3 participants