replace format with %s in logging calls #92

wholmgren · 2015-10-27T16:03:57Z

closes #89

I'll rebase this on #90 once I work out yet another CI issue in that.

wholmgren · 2015-10-27T17:33:56Z

Rebased. I'll wait a day before merging.

jforbess · 2015-10-27T18:11:29Z

Did you get an estimate on how much of an improvement this is in
performance?

On Tue, Oct 27, 2015 at 10:33 AM, Will Holmgren [email protected]
wrote:

Rebased. I'll wait a day before merging.

—
Reply to this email directly or view it on GitHub
#92 (comment).

wholmgren · 2015-10-28T01:58:22Z

Good question.

I only tested tracking.singleaxis since it has by far the most debug calls with large arrays or dataframes. Calling format on a single number is pretty fast (more on this at the bottom). I ran the first 4 cells of tracking.ipynb, then skipped down to tracking.singleaxis examples. I profiled the first cell in that section with %%timeit and %lprun...

With the current pvlib:master (9b2b368):

%%timeit
tracker_data = pvlib.tracking.singleaxis(pyephemout['apparent_zenith'], pyephemout['apparent_azimuth'],
                                         axis_tilt=0, axis_azimuth=180, max_angle=90,
                                         backtrack=True, gcr=2.0/7.0)

10 loops, best of 3: 138 ms per loop

And if I run line_profiler with

%lprun -f pvlib.tracking.singleaxis pvlib.tracking.singleaxis(pyephemout['apparent_zenith'], pyephemout['apparent_azimuth'], axis_tilt=0, axis_azimuth=180, max_angle=90, backtrack=True, gcr=2.0/7.0)

I see that ~75% of the time is spent on the pvl_logger.debug statements from lines 263-284.

After a git checkout logging and...

from importlib import reload
pvlib = reload(pvlib)
pvlib.tracking = reload(pvlib.tracking)

...with the current PR wholmgren:logging (3acc134):

%%timeit
tracker_data = pvlib.tracking.singleaxis(pyephemout['apparent_zenith'], pyephemout['apparent_azimuth'],
                                         axis_tilt=0, axis_azimuth=180, max_angle=90,
                                         backtrack=True, gcr=2.0/7.0)

10 loops, best of 3: 26 ms per loop

More than 5x faster! Face, meet palm.

And if I run line_profiler with

%lprun -f pvlib.tracking.singleaxis pvlib.tracking.singleaxis(pyephemout['apparent_zenith'], pyephemout['apparent_azimuth'], axis_tilt=0, axis_azimuth=180, max_angle=90, backtrack=True, gcr=2.0/7.0)

those logging calls take 0.0% of the time. Note that the logging call on line 100 still uses 1% of the time due to the fact that Python must still call apparent_zenith.head(), apparent_azimuth.head() before the logging module decides if string formatting is necessary. I suppose it could be removed and people could use a debugger if they're really that interested. Anyways, the longest call is now line 214: widc = pd.Series(index=times) with 25% of the time, and the next two widc assignment lines combining for another 25%.

Finally some numbers to back up the claim that single value formatting is already reasonably fast and to justify the fact that I don't want to bother to benchmark the rest of the library:

# numpy float
In [21]:
a = np.float64(1)
b = np.float64(2)

In [22]:
%%timeit
'{} {}'.format(a, b)
The slowest run took 15.96 times longer than the fastest. This could mean that an intermediate result is being cached 
1000000 loops, best of 3: 1.91 µs per loop

# python int
In [23]:
a = 1
b = 2

In [24]:
%%timeit
'{} {}'.format(a, b)
The slowest run took 8.39 times longer than the fastest. This could mean that an intermediate result is being cached 
1000000 loops, best of 3: 535 ns per loop

# python float
In [27]:
a = 1.
b = 2.

In [29]:
%%timeit
'{} {}'.format(a, b)
The slowest run took 10.21 times longer than the fastest. This could mean that an intermediate result is being cached 
1000000 loops, best of 3: 916 ns per loop

So it's faster to format native python ints than native python floats than numpy floats. All of these times are much faster than any of the functions that they'd be found in.

jforbess · 2015-10-28T17:04:04Z

Great, thanks.

On Tue, Oct 27, 2015 at 6:58 PM, Will Holmgren [email protected]
wrote:

Good question.

I only tested tracking.singleaxis since it has by far the most debug
calls with large arrays or dataframes. Calling format on a single number is
pretty fast (more on this at the bottom). I ran the first 4 cells of
tracking.ipynb
http://nbviewer.ipython.org/github/pvlib/pvlib-python/blob/master/docs/tutorials/tracking.ipynb,
then skipped down to tracking.singleaxis examples
http://nbviewer.ipython.org/github/pvlib/pvlib-python/blob/master/docs/tutorials/tracking.ipynb#tracking.singleaxis-examples.
I profiled the first cell in that section with %%timeit and %lprun...

With the current pvlib:master (9b2b368):

%%timeit
tracker_data = pvlib.tracking.singleaxis(pyephemout['apparent_zenith'], pyephemout['apparent_azimuth'],
axis_tilt=0, axis_azimuth=180, max_angle=90,
backtrack=True, gcr=2.0/7.0)
10 loops, best of 3: 138 ms per loop

And if I run line_profiler with

%lprun -f pvlib.tracking.singleaxis pvlib.tracking.singleaxis(pyephemout['apparent_zenith'], pyephemout['apparent_azimuth'], axis_tilt=0, axis_azimuth=180, max_angle=90, backtrack=True, gcr=2.0/7.0)

I see that ~75% of the time is spent on the pvl_logger.debug statements
from lines 263-284.

After a git checkout logging and...

from importlib import reload
pvlib = reload(pvlib)
pvlib.tracking = reload(pvlib.tracking)

...with the current PR wholmgren:logging (3acc134):

%%timeit
tracker_data = pvlib.tracking.singleaxis(pyephemout['apparent_zenith'], pyephemout['apparent_azimuth'],
axis_tilt=0, axis_azimuth=180, max_angle=90,
backtrack=True, gcr=2.0/7.0)
10 loops, best of 3: 26 ms per loop

More than 5x faster! Face, meet palm.

And if I run line_profiler with

%lprun -f pvlib.tracking.singleaxis pvlib.tracking.singleaxis(pyephemout['apparent_zenith'], pyephemout['apparent_azimuth'], axis_tilt=0, axis_azimuth=180, max_angle=90, backtrack=True, gcr=2.0/7.0)

those logging calls take 0.0% of the time. Note that the logging call on
line 100 still uses 1% of the time due to the fact that Python must still
call apparent_zenith.head(), apparent_azimuth.head() before the logging
module decides if string formatting is necessary. I suppose it could be
removed and people could use a debugger if they're really that interested.
Anyways, the longest call is now line 214: widc = pd.Series(index=times)
with 25% of the time, and the next two widc assignment lines combining
for another 25%.

Finally some numbers to back up the claim that single value formatting is
already reasonably fast and to justify the fact that I don't want to bother
to benchmark the rest of the library:

numpy float

In [21]:
a = np.float64(1)
b = np.float64(2)

In [22]:%%timeit'{} {}'.format(a, b)
The slowest run took 15.96 times longer than the fastest. This could mean that an intermediate result is being cached 1000000 loops, best of 3: 1.91 µs per loop

python int

In [23]:
a = 1
b = 2

In [24]:%%timeit'{} {}'.format(a, b)
The slowest run took 8.39 times longer than the fastest. This could mean that an intermediate result is being cached 1000000 loops, best of 3: 535 ns per loop

python float

In [27]:
a = 1.
b = 2.

In [29]:%%timeit'{} {}'.format(a, b)
The slowest run took 10.21 times longer than the fastest. This could mean that an intermediate result is being cached 1000000 loops, best of 3: 916 ns per loop

So it's faster to format native python ints than native python floats than
numpy floats. All of these times are much faster than any of the functions
that they'd be found in.

—
Reply to this email directly or view it on GitHub
#92 (comment).

wholmgren · 2015-10-28T18:58:57Z

I added a 0.2.2 whatsnew doc with a note about this change. It can be renamed in the future if we go straight to 0.3.

replace format with %s in logging calls

wholmgren added the bug label Oct 27, 2015

replace format with %s in logging calls

3acc134

wholmgren force-pushed the logging branch from fc30291 to 3acc134 Compare October 27, 2015 17:12

add whatsnew

965ff7c

wholmgren added a commit that referenced this pull request Oct 28, 2015

Merge pull request #92 from wholmgren/logging

4cda11a

replace format with %s in logging calls

wholmgren merged commit 4cda11a into pvlib:master Oct 28, 2015

wholmgren deleted the logging branch October 28, 2015 18:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

replace format with %s in logging calls #92

replace format with %s in logging calls #92

wholmgren commented Oct 27, 2015

wholmgren commented Oct 27, 2015

jforbess commented Oct 27, 2015

wholmgren commented Oct 28, 2015

jforbess commented Oct 28, 2015

numpy float

python int

python float

wholmgren commented Oct 28, 2015

replace format with %s in logging calls #92

replace format with %s in logging calls #92

Conversation

wholmgren commented Oct 27, 2015

wholmgren commented Oct 27, 2015

jforbess commented Oct 27, 2015

wholmgren commented Oct 28, 2015

jforbess commented Oct 28, 2015

numpy float

python int

python float

wholmgren commented Oct 28, 2015