use horners method in DISC instead of calculating so many powers and polynomials #1180

mikofski · 2021-02-26T07:59:32Z

Describe the bug
the disc method has a lot of power and polynomials. IMO whenever possible, polynomials should use Horner's method which is already implemented in np.polyval but is easy to reimplement if needed.

To Reproduce

pvlib-python/pvlib/irradiance.py

Lines 1405 to 1422 in 8b98768

    
           # powers of kt will be used repeatedly, so compute only once 
        
           kt2 = kt * kt  # about the same as kt ** 2 
        
           kt3 = kt2 * kt  # 5-10x faster than kt ** 3 
        
           bools = (kt <= 0.6) 
        
           a = np.where(bools, 
        
                        0.512 - 1.56*kt + 2.286*kt2 - 2.222*kt3, 
        
                        -5.743 + 21.77*kt - 27.49*kt2 + 11.56*kt3) 
        
           b = np.where(bools, 
        
                        0.37 + 0.962*kt, 
        
                        41.4 - 118.5*kt + 66.05*kt2 + 31.9*kt3) 
        
           c = np.where(bools, 
        
                        -0.28 + 0.932*kt - 2.048*kt2, 
        
                        -47.01 + 184.2*kt - 222.0*kt2 + 73.81*kt3) 
        
           delta_kn = a + b * np.exp(c*am) 
        
           Knc = 0.866 - 0.122*am + 0.0121*am**2 - 0.000653*am**3 + 1.4e-05*am**4

0.512 - 1.56*kt + 2.286*kt2 - 2.222*kt3

becomes

0.512 - kt * (1.56 + kt * (2.286 - 2.222 * kt))

or

np.polyval([-2.222, 2.286, -1.56, 0.512], kt)

Expected behavior
a few math tricks like this (atan2, log1p, Horner's method, etc.) are musts for efficiency and numerical stability.

Versions:

pvlib.__version__: 0.8.1

The text was updated successfully, but these errors were encountered:

cwhanse · 2021-02-26T15:38:40Z

np.polyval([-2.222, 2.286, -1.56, 0.512], kt)

is more readable to me than the nested parentheses.

I'd be interested to see the performance difference.

wholmgren · 2021-02-26T16:13:08Z

I'm open to a PR that includes an asv test for efficiency and a regular test for precision. Keep in mind that _disc_kn is used within a loop in gti_dirint.

For anyone that comes across this issue when trying to understand why disc doesn't produce results that are as accurate as you'd like: this is not the cause of your problem. Search the pvlib google group and stack overflow tag for previous explanations.

mikofski · 2021-02-26T22:33:17Z

by the numbers, Horner's is 2x faster which np.polyval is 50% slower:

In [32]: am = np.linspace(1.5, 6.0, 40)

In [33]: Knc = 0.866 - 0.122*am + 0.0121*am**2 - 0.000653*am**3 + 1.4e-05*am**4

In [34]: np.allclose(Knc, 0.866 + am*(-0.122 + am*(0.0121 + am*(-0.000653 + 1.4e-05*am))))
Out[34]: True

In [35]: np.allclose(Knc, np.polyval(p, am))
Out[35]: True

In [36]: %timeit 0.866 - 0.122*am + 0.0121*am**2 - 0.000653*am**3 + 1.4e-05*am**4
10.9 µs ± 259 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

In [37]: %timeit np.polyval(p, am)
16 µs ± 128 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

In [38]: %timeit 0.866 + am*(-0.122 + am*(0.0121 + am*(-0.000653 + 1.4e-05*am)))
4.67 µs ± 40 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

since kt2 and kt3 are already pre-calculated and reused in a, b, and c I don't expect there to be any more time to be gained there, so really it's just Knc that could be 2x faster using Horner's. Tough call. Not worth it?

cwhanse · 2021-02-26T22:36:22Z

Good info about np.polyval, convenient but slower than doing it ourselves. In this case, performance > readability.

Not worth it?

It's a small effort to improve a frequently-used function so I'd say yes, worth it.

mikofski · 2021-02-27T05:04:59Z

I counted and surprisingly, there are the same flops in the existing kt, kt2, kt3 code than in Horner's, in fact, there are exactly 3 more, because it's unnecessary to pre-calculate kt, kt2, kt3 when using Horner's

By the numbers:

In [3]: kt = np.linspace(0.1, 0.9, 90)

In [4]: %paste
def test1(kt):
    kt2 = kt * kt  # about the same as kt ** 2
    kt3 = kt2 * kt  # 5-10x faster than kt ** 3

    bools = (kt <= 0.6)
    a = np.where(bools,
              0.512 - 1.56*kt + 2.286*kt2 - 2.222*kt3,
              -5.743 + 21.77*kt - 27.49*kt2 + 11.56*kt3)
    b = np.where(bools,
              0.37 + 0.962*kt,
              41.4 - 118.5*kt + 66.05*kt2 + 31.9*kt3)
    c = np.where(bools,
              -0.28 + 0.932*kt - 2.048*kt2,
              -47.01 + 184.2*kt - 222.0*kt2 + 73.81*kt3)
    return a, b, c

## -- End pasted text --

In [7]: a, b, c = test1(kt)

In [9]: %paste
def test2(kt):
    bools = (kt <= 0.6)
    a = np.where(bools,
              0.512 + kt*(-1.56 + kt*(2.286 - 2.222*kt)),
              -5.743 + kt*(21.77 + kt*(-27.49 + 11.56*kt)))
    b = np.where(bools,
              0.37 + 0.962*kt,
              41.4 + kt*(-118.5 + kt*(66.05 + 31.9*kt)))
    c = np.where(bools,
              -0.28 + kt*(0.932 - 2.048*kt),
              -47.01 + kt*(184.2 + kt*(-222.0 + 73.81*kt)))
    return a, b, c

## -- End pasted text --

In [10]: x, y, z = test2(kt)

In [11]: np.allclose(a,x)
Out[11]: True

In [12]: np.allclose(b,y)
Out[12]: True

In [13]: np.allclose(c,z)
Out[13]: True

In [14]: %timeit test2(kt)
23.3 µs ± 348 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [15]: %timeit test2(kt)
23.9 µs ± 462 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [16]: %timeit test1(kt)
26.2 µs ± 179 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [17]: %timeit test1(kt)
27.3 µs ± 661 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

Surprise! Horners is about 10% faster for this part of the DISC too!

adriesse · 2021-03-18T09:36:09Z

Miscellaneous observations:

Since kt is in the range 0-1 I would not expect numerical issues with any evaluation method. Air mass in the range 1-40 should be fine too.
For speed, you could also look into using np.dot with the factors.
The average scientist or student exploring the inner workings of pvlib will probably find the original easiest to understand.
Thinking about code clarity, you could consider changing the name bools to is_cloudy.

mikofski · 2021-03-18T21:14:40Z

Great feedback!

the average scientist or student exploring the inner workings of pvlib will probably find the original easiest to understand.

Possibly true, but I feel a bit like Horner's method, log1p, & arctan2 are tools of the trade like unit tests, Sphinx doc RST markup, Git, etc., that contributors & students of scientific computing must learn

Thinking about code clarity, you could consider changing the name bools to is_cloudy

I agree

For speed, you could also look into using np.dot with the factors.

When I test np.dot or np.linalg.norm or np.polyval they're usually slower than just x*x + y*y but I think readability is important, so maybe depends on usage.

mikofski mentioned this issue Mar 5, 2021

use horner's method in disc #1183

Merged

9 tasks

kandersolar added the performance label Mar 15, 2023

mikofski closed this as completed in #1183 Mar 16, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

use horners method in DISC instead of calculating so many powers and polynomials #1180

use horners method in DISC instead of calculating so many powers and polynomials #1180

mikofski commented Feb 26, 2021

cwhanse commented Feb 26, 2021

wholmgren commented Feb 26, 2021 •

edited

Loading

mikofski commented Feb 26, 2021

cwhanse commented Feb 26, 2021

mikofski commented Feb 27, 2021

adriesse commented Mar 18, 2021

mikofski commented Mar 18, 2021 •

edited

Loading

use horners method in DISC instead of calculating so many powers and polynomials #1180

use horners method in DISC instead of calculating so many powers and polynomials #1180

Comments

mikofski commented Feb 26, 2021

cwhanse commented Feb 26, 2021

wholmgren commented Feb 26, 2021 • edited Loading

mikofski commented Feb 26, 2021

cwhanse commented Feb 26, 2021

mikofski commented Feb 27, 2021

adriesse commented Mar 18, 2021

mikofski commented Mar 18, 2021 • edited Loading

wholmgren commented Feb 26, 2021 •

edited

Loading

mikofski commented Mar 18, 2021 •

edited

Loading