Skip to content

replace format with %s in logging calls #92

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Oct 28, 2015
Merged

Conversation

wholmgren
Copy link
Member

closes #89

I'll rebase this on #90 once I work out yet another CI issue in that.

@wholmgren wholmgren added the bug label Oct 27, 2015
@wholmgren
Copy link
Member Author

Rebased. I'll wait a day before merging.

@jforbess
Copy link
Contributor

Did you get an estimate on how much of an improvement this is in
performance?

On Tue, Oct 27, 2015 at 10:33 AM, Will Holmgren [email protected]
wrote:

Rebased. I'll wait a day before merging.


Reply to this email directly or view it on GitHub
#92 (comment).

@wholmgren
Copy link
Member Author

Good question.

I only tested tracking.singleaxis since it has by far the most debug calls with large arrays or dataframes. Calling format on a single number is pretty fast (more on this at the bottom). I ran the first 4 cells of tracking.ipynb, then skipped down to tracking.singleaxis examples. I profiled the first cell in that section with %%timeit and %lprun...

With the current pvlib:master (9b2b368):

%%timeit
tracker_data = pvlib.tracking.singleaxis(pyephemout['apparent_zenith'], pyephemout['apparent_azimuth'],
                                         axis_tilt=0, axis_azimuth=180, max_angle=90,
                                         backtrack=True, gcr=2.0/7.0)

10 loops, best of 3: 138 ms per loop

And if I run line_profiler with

%lprun -f pvlib.tracking.singleaxis pvlib.tracking.singleaxis(pyephemout['apparent_zenith'], pyephemout['apparent_azimuth'], axis_tilt=0, axis_azimuth=180, max_angle=90, backtrack=True, gcr=2.0/7.0)

I see that ~75% of the time is spent on the pvl_logger.debug statements from lines 263-284.

After a git checkout logging and...

from importlib import reload
pvlib = reload(pvlib)
pvlib.tracking = reload(pvlib.tracking)

...with the current PR wholmgren:logging (3acc134):

%%timeit
tracker_data = pvlib.tracking.singleaxis(pyephemout['apparent_zenith'], pyephemout['apparent_azimuth'],
                                         axis_tilt=0, axis_azimuth=180, max_angle=90,
                                         backtrack=True, gcr=2.0/7.0)

10 loops, best of 3: 26 ms per loop

More than 5x faster! Face, meet palm.

And if I run line_profiler with

%lprun -f pvlib.tracking.singleaxis pvlib.tracking.singleaxis(pyephemout['apparent_zenith'], pyephemout['apparent_azimuth'], axis_tilt=0, axis_azimuth=180, max_angle=90, backtrack=True, gcr=2.0/7.0)

those logging calls take 0.0% of the time. Note that the logging call on line 100 still uses 1% of the time due to the fact that Python must still call apparent_zenith.head(), apparent_azimuth.head() before the logging module decides if string formatting is necessary. I suppose it could be removed and people could use a debugger if they're really that interested. Anyways, the longest call is now line 214: widc = pd.Series(index=times) with 25% of the time, and the next two widc assignment lines combining for another 25%.

Finally some numbers to back up the claim that single value formatting is already reasonably fast and to justify the fact that I don't want to bother to benchmark the rest of the library:

# numpy float
In [21]:
a = np.float64(1)
b = np.float64(2)

In [22]:
%%timeit
'{} {}'.format(a, b)
The slowest run took 15.96 times longer than the fastest. This could mean that an intermediate result is being cached 
1000000 loops, best of 3: 1.91 µs per loop

# python int
In [23]:
a = 1
b = 2

In [24]:
%%timeit
'{} {}'.format(a, b)
The slowest run took 8.39 times longer than the fastest. This could mean that an intermediate result is being cached 
1000000 loops, best of 3: 535 ns per loop

# python float
In [27]:
a = 1.
b = 2.

In [29]:
%%timeit
'{} {}'.format(a, b)
The slowest run took 10.21 times longer than the fastest. This could mean that an intermediate result is being cached 
1000000 loops, best of 3: 916 ns per loop

So it's faster to format native python ints than native python floats than numpy floats. All of these times are much faster than any of the functions that they'd be found in.

@jforbess
Copy link
Contributor

Great, thanks.

On Tue, Oct 27, 2015 at 6:58 PM, Will Holmgren [email protected]
wrote:

Good question.

I only tested tracking.singleaxis since it has by far the most debug
calls with large arrays or dataframes. Calling format on a single number is
pretty fast (more on this at the bottom). I ran the first 4 cells of
tracking.ipynb
http://nbviewer.ipython.org/github/pvlib/pvlib-python/blob/master/docs/tutorials/tracking.ipynb,
then skipped down to tracking.singleaxis examples
http://nbviewer.ipython.org/github/pvlib/pvlib-python/blob/master/docs/tutorials/tracking.ipynb#tracking.singleaxis-examples.
I profiled the first cell in that section with %%timeit and %lprun...

With the current pvlib:master (9b2b368):

%%timeit
tracker_data = pvlib.tracking.singleaxis(pyephemout['apparent_zenith'], pyephemout['apparent_azimuth'],
axis_tilt=0, axis_azimuth=180, max_angle=90,
backtrack=True, gcr=2.0/7.0)
10 loops, best of 3: 138 ms per loop

And if I run line_profiler with

%lprun -f pvlib.tracking.singleaxis pvlib.tracking.singleaxis(pyephemout['apparent_zenith'], pyephemout['apparent_azimuth'], axis_tilt=0, axis_azimuth=180, max_angle=90, backtrack=True, gcr=2.0/7.0)

I see that ~75% of the time is spent on the pvl_logger.debug statements
from lines 263-284.

After a git checkout logging and...

from importlib import reload
pvlib = reload(pvlib)
pvlib.tracking = reload(pvlib.tracking)

...with the current PR wholmgren:logging (3acc134):

%%timeit
tracker_data = pvlib.tracking.singleaxis(pyephemout['apparent_zenith'], pyephemout['apparent_azimuth'],
axis_tilt=0, axis_azimuth=180, max_angle=90,
backtrack=True, gcr=2.0/7.0)
10 loops, best of 3: 26 ms per loop

More than 5x faster! Face, meet palm.

And if I run line_profiler with

%lprun -f pvlib.tracking.singleaxis pvlib.tracking.singleaxis(pyephemout['apparent_zenith'], pyephemout['apparent_azimuth'], axis_tilt=0, axis_azimuth=180, max_angle=90, backtrack=True, gcr=2.0/7.0)

those logging calls take 0.0% of the time. Note that the logging call on
line 100 still uses 1% of the time due to the fact that Python must still
call apparent_zenith.head(), apparent_azimuth.head() before the logging
module decides if string formatting is necessary. I suppose it could be
removed and people could use a debugger if they're really that interested.
Anyways, the longest call is now line 214: widc = pd.Series(index=times)
with 25% of the time, and the next two widc assignment lines combining
for another 25%.

Finally some numbers to back up the claim that single value formatting is
already reasonably fast and to justify the fact that I don't want to bother
to benchmark the rest of the library:

numpy float

In [21]:
a = np.float64(1)
b = np.float64(2)

In [22]:%%timeit'{} {}'.format(a, b)
The slowest run took 15.96 times longer than the fastest. This could mean that an intermediate result is being cached 1000000 loops, best of 3: 1.91 µs per loop

python int

In [23]:
a = 1
b = 2

In [24]:%%timeit'{} {}'.format(a, b)
The slowest run took 8.39 times longer than the fastest. This could mean that an intermediate result is being cached 1000000 loops, best of 3: 535 ns per loop

python float

In [27]:
a = 1.
b = 2.

In [29]:%%timeit'{} {}'.format(a, b)
The slowest run took 10.21 times longer than the fastest. This could mean that an intermediate result is being cached 1000000 loops, best of 3: 916 ns per loop

So it's faster to format native python ints than native python floats than
numpy floats. All of these times are much faster than any of the functions
that they'd be found in.


Reply to this email directly or view it on GitHub
#92 (comment).

@wholmgren
Copy link
Member Author

I added a 0.2.2 whatsnew doc with a note about this change. It can be renamed in the future if we go straight to 0.3.

wholmgren added a commit that referenced this pull request Oct 28, 2015
replace format with %s in logging calls
@wholmgren wholmgren merged commit 4cda11a into pvlib:master Oct 28, 2015
@wholmgren wholmgren deleted the logging branch October 28, 2015 18:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

make logging calls more efficient
2 participants