Skip to content

use horners method in DISC instead of calculating so many powers and polynomials #1180

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
mikofski opened this issue Feb 26, 2021 · 7 comments · Fixed by #1183
Closed

use horners method in DISC instead of calculating so many powers and polynomials #1180

mikofski opened this issue Feb 26, 2021 · 7 comments · Fixed by #1183

Comments

@mikofski
Copy link
Member

Describe the bug
the disc method has a lot of power and polynomials. IMO whenever possible, polynomials should use Horner's method which is already implemented in np.polyval but is easy to reimplement if needed.

To Reproduce

# powers of kt will be used repeatedly, so compute only once
kt2 = kt * kt # about the same as kt ** 2
kt3 = kt2 * kt # 5-10x faster than kt ** 3
bools = (kt <= 0.6)
a = np.where(bools,
0.512 - 1.56*kt + 2.286*kt2 - 2.222*kt3,
-5.743 + 21.77*kt - 27.49*kt2 + 11.56*kt3)
b = np.where(bools,
0.37 + 0.962*kt,
41.4 - 118.5*kt + 66.05*kt2 + 31.9*kt3)
c = np.where(bools,
-0.28 + 0.932*kt - 2.048*kt2,
-47.01 + 184.2*kt - 222.0*kt2 + 73.81*kt3)
delta_kn = a + b * np.exp(c*am)
Knc = 0.866 - 0.122*am + 0.0121*am**2 - 0.000653*am**3 + 1.4e-05*am**4

0.512 - 1.56*kt + 2.286*kt2 - 2.222*kt3

becomes

0.512 - kt * (1.56 + kt * (2.286 - 2.222 * kt))

or

np.polyval([-2.222, 2.286, -1.56, 0.512], kt)

Expected behavior
a few math tricks like this (atan2, log1p, Horner's method, etc.) are musts for efficiency and numerical stability.

Versions:

  • pvlib.__version__: 0.8.1
@cwhanse
Copy link
Member

cwhanse commented Feb 26, 2021

np.polyval([-2.222, 2.286, -1.56, 0.512], kt)

is more readable to me than the nested parentheses.

I'd be interested to see the performance difference.

@wholmgren
Copy link
Member

wholmgren commented Feb 26, 2021

I'm open to a PR that includes an asv test for efficiency and a regular test for precision. Keep in mind that _disc_kn is used within a loop in gti_dirint.

For anyone that comes across this issue when trying to understand why disc doesn't produce results that are as accurate as you'd like: this is not the cause of your problem. Search the pvlib google group and stack overflow tag for previous explanations.

@mikofski
Copy link
Member Author

by the numbers, Horner's is 2x faster which np.polyval is 50% slower:

In [32]: am = np.linspace(1.5, 6.0, 40)

In [33]: Knc = 0.866 - 0.122*am + 0.0121*am**2 - 0.000653*am**3 + 1.4e-05*am**4

In [34]: np.allclose(Knc, 0.866 + am*(-0.122 + am*(0.0121 + am*(-0.000653 + 1.4e-05*am))))
Out[34]: True

In [35]: np.allclose(Knc, np.polyval(p, am))
Out[35]: True

In [36]: %timeit 0.866 - 0.122*am + 0.0121*am**2 - 0.000653*am**3 + 1.4e-05*am**4
10.9 µs ± 259 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

In [37]: %timeit np.polyval(p, am)
16 µs ± 128 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

In [38]: %timeit 0.866 + am*(-0.122 + am*(0.0121 + am*(-0.000653 + 1.4e-05*am)))
4.67 µs ± 40 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

since kt2 and kt3 are already pre-calculated and reused in a, b, and c I don't expect there to be any more time to be gained there, so really it's just Knc that could be 2x faster using Horner's. Tough call. Not worth it?

@cwhanse
Copy link
Member

cwhanse commented Feb 26, 2021

Good info about np.polyval, convenient but slower than doing it ourselves. In this case, performance > readability.

Not worth it?

It's a small effort to improve a frequently-used function so I'd say yes, worth it.

@mikofski
Copy link
Member Author

I counted and surprisingly, there are the same flops in the existing kt, kt2, kt3 code than in Horner's, in fact, there are exactly 3 more, because it's unnecessary to pre-calculate kt, kt2, kt3 when using Horner's

By the numbers:

In [3]: kt = np.linspace(0.1, 0.9, 90)

In [4]: %paste
def test1(kt):
    kt2 = kt * kt  # about the same as kt ** 2
    kt3 = kt2 * kt  # 5-10x faster than kt ** 3

    bools = (kt <= 0.6)
    a = np.where(bools,
              0.512 - 1.56*kt + 2.286*kt2 - 2.222*kt3,
              -5.743 + 21.77*kt - 27.49*kt2 + 11.56*kt3)
    b = np.where(bools,
              0.37 + 0.962*kt,
              41.4 - 118.5*kt + 66.05*kt2 + 31.9*kt3)
    c = np.where(bools,
              -0.28 + 0.932*kt - 2.048*kt2,
              -47.01 + 184.2*kt - 222.0*kt2 + 73.81*kt3)
    return a, b, c

## -- End pasted text --

In [7]: a, b, c = test1(kt)

In [9]: %paste
def test2(kt):
    bools = (kt <= 0.6)
    a = np.where(bools,
              0.512 + kt*(-1.56 + kt*(2.286 - 2.222*kt)),
              -5.743 + kt*(21.77 + kt*(-27.49 + 11.56*kt)))
    b = np.where(bools,
              0.37 + 0.962*kt,
              41.4 + kt*(-118.5 + kt*(66.05 + 31.9*kt)))
    c = np.where(bools,
              -0.28 + kt*(0.932 - 2.048*kt),
              -47.01 + kt*(184.2 + kt*(-222.0 + 73.81*kt)))
    return a, b, c

## -- End pasted text --

In [10]: x, y, z = test2(kt)

In [11]: np.allclose(a,x)
Out[11]: True

In [12]: np.allclose(b,y)
Out[12]: True

In [13]: np.allclose(c,z)
Out[13]: True

In [14]: %timeit test2(kt)
23.3 µs ± 348 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [15]: %timeit test2(kt)
23.9 µs ± 462 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [16]: %timeit test1(kt)
26.2 µs ± 179 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [17]: %timeit test1(kt)
27.3 µs ± 661 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

Surprise! Horners is about 10% faster for this part of the DISC too!

@adriesse
Copy link
Member

Miscellaneous observations:

  • Since kt is in the range 0-1 I would not expect numerical issues with any evaluation method. Air mass in the range 1-40 should be fine too.
  • For speed, you could also look into using np.dot with the factors.
  • The average scientist or student exploring the inner workings of pvlib will probably find the original easiest to understand.
  • Thinking about code clarity, you could consider changing the name bools to is_cloudy.

@mikofski
Copy link
Member Author

mikofski commented Mar 18, 2021

Great feedback!

the average scientist or student exploring the inner workings of pvlib will probably find the original easiest to understand.

Possibly true, but I feel a bit like Horner's method, log1p, & arctan2 are tools of the trade like unit tests, Sphinx doc RST markup, Git, etc., that contributors & students of scientific computing must learn

Thinking about code clarity, you could consider changing the name bools to is_cloudy

I agree

For speed, you could also look into using np.dot with the factors.

When I test np.dot or np.linalg.norm or np.polyval they're usually slower than just x*x + y*y but I think readability is important, so maybe depends on usage.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants