Skip to content

Commit b64f438

Browse files
BUG: Clip corr edge cases between -1.0 and 1.0 (#61154)
* clip correlation coefficient between -1 and 1 * Added test to check if corr within bounds * Added tuple to mistyped parameter * Transfered np.clip to algos.nancorr * Clip covxy / divsor instead of result * Clip covxy / divsor within nogil * Added whatsnew note * Replaced long entry with single entry --------- Co-authored-by: John Hendricks <[email protected]>
1 parent 4de503d commit b64f438

File tree

3 files changed

+21
-3
lines changed

3 files changed

+21
-3
lines changed

doc/source/whatsnew/v3.0.0.rst

+1
Original file line numberDiff line numberDiff line change
@@ -673,6 +673,7 @@ Timezones
673673

674674
Numeric
675675
^^^^^^^
676+
- Bug in :meth:`DataFrame.corr` where numerical precision errors resulted in correlations above ``1.0`` (:issue:`61120`)
676677
- Bug in :meth:`DataFrame.quantile` where the column type was not preserved when ``numeric_only=True`` with a list-like ``q`` produced an empty result (:issue:`59035`)
677678
- Bug in ``np.matmul`` with :class:`Index` inputs raising a ``TypeError`` (:issue:`57079`)
678679

pandas/_libs/algos.pyx

+8-3
Original file line numberDiff line numberDiff line change
@@ -353,10 +353,9 @@ def nancorr(const float64_t[:, :] mat, bint cov=False, minp=None):
353353
float64_t[:, ::1] result
354354
uint8_t[:, :] mask
355355
int64_t nobs = 0
356-
float64_t vx, vy, dx, dy, meanx, meany, divisor, ssqdmx, ssqdmy, covxy
356+
float64_t vx, vy, dx, dy, meanx, meany, divisor, ssqdmx, ssqdmy, covxy, val
357357

358358
N, K = (<object>mat).shape
359-
360359
if minp is None:
361360
minpv = 1
362361
else:
@@ -389,8 +388,14 @@ def nancorr(const float64_t[:, :] mat, bint cov=False, minp=None):
389388
else:
390389
divisor = (nobs - 1.0) if cov else sqrt(ssqdmx * ssqdmy)
391390

391+
# clip `covxy / divisor` to ensure coeff is within bounds
392392
if divisor != 0:
393-
result[xi, yi] = result[yi, xi] = covxy / divisor
393+
val = covxy / divisor
394+
if val > 1.0:
395+
val = 1.0
396+
elif val < -1.0:
397+
val = -1.0
398+
result[xi, yi] = result[yi, xi] = val
394399
else:
395400
result[xi, yi] = result[yi, xi] = NaN
396401

pandas/tests/frame/methods/test_cov_corr.py

+12
Original file line numberDiff line numberDiff line change
@@ -485,3 +485,15 @@ def test_corrwith_min_periods_boolean(self):
485485
result = df_bool.corrwith(ser_bool, min_periods=3)
486486
expected = Series([0.57735, 0.57735], index=["A", "B"])
487487
tm.assert_series_equal(result, expected)
488+
489+
def test_corr_within_bounds(self):
490+
df1 = DataFrame({"x": [0, 1], "y": [1.35951, 1.3595100000000007]})
491+
result1 = df1.corr().max().max()
492+
expected1 = 1.0
493+
tm.assert_equal(result1, expected1)
494+
495+
rng = np.random.default_rng(seed=42)
496+
df2 = DataFrame(rng.random((100, 4)))
497+
corr_matrix = df2.corr()
498+
assert corr_matrix.min().min() >= -1.0
499+
assert corr_matrix.max().max() <= 1.0

0 commit comments

Comments
 (0)