Skip to content

BUG: cluster: cophenet intercept invalid linkage matrix count #22187

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Jan 27, 2025

Conversation

tylerjereddy
Copy link
Contributor

@tylerjereddy tylerjereddy commented Dec 25, 2024

  • Fixes BUG: Segmentation Fault when passing a special array to scipy.cluster.hierarchy.cophenet #22183.

  • The input data from BUG: Segmentation Fault when passing a special array to scipy.cluster.hierarchy.cophenet #22183 causes an out of bounds memory access in a 1-D memoryview in the Cython cophenetic_distances function. As I argue in the source code comment, regardless of the algorithm intention, and even in light of the paucity of tests for cophenet, I don't think there can be much argument against enforcing a loop invariant that an index doesn't run past the end of a bufffer. Catch the excessive cluster membership (count) in the fourth column of the invalid linkage matrix Z and raise an appropriate ValueError to avoid the segfault.

  • I used pytest-repeat to confirm that the segfault disappears with this patch on > 200 repeat incantations.

  • While I would agree that it would be preferable to have an algorithm expert clean up the root cause, it doesn't really seem particularly controversial to me to enforce this loop invariant until such a cleanup happens.

[skip circle]

@tylerjereddy tylerjereddy added defect A clear bug or issue that prevents SciPy from being installed or used as expected scipy.cluster labels Dec 25, 2024
@github-actions github-actions bot added the Cython Issues with the internal Cython code base label Dec 25, 2024
@tylerjereddy tylerjereddy added this to the 1.16.0 milestone Dec 25, 2024
@lucascolley lucascolley changed the title BUG: cophenet enforce invariant BUG: cluster: cophenet enforce invariant Dec 25, 2024
@WarrenWeckesser
Copy link
Member

WarrenWeckesser commented Dec 25, 2024

I think it would be better to strengthen is_valid_linkage() and catch the problem before cophenetic_distances() is called.

If you want, I can work on that, but I probably wouldn't get to an actual PR until next week.

@tylerjereddy
Copy link
Contributor Author

I'll see if I can find a clear violation of the stated rules at least.

@WarrenWeckesser
Copy link
Member

I'll see if I can find a clear violation of the stated rules at least.

I added a comment over in #22183 about the source of the problem.

@tylerjereddy tylerjereddy changed the title BUG: cluster: cophenet enforce invariant BUG: cluster: cophenet intercept invalid linkage matrix count Dec 26, 2024
@tylerjereddy
Copy link
Contributor Author

Ok, I revised to catch the invalid cluster membership count in the fourth column of the linkage matrix Z and raise an appropriate ValueError to avoid the segfault. The violation is more clearly revealed at the end of the paragraph starting with A (n - 1) by 4 matrix Z is... in the scipy.cluster.hierarchy.linkage docstring.

I added Warren as a co-author on the commit. I suggest we may want to keep another issue open to reject Z on the grounds of non-integer fourth column values, another issue Warren described, but this may be slightly more annoying to do (probably not too hard though), since the data type of Z itself is enforced as double.

tylerjereddy and others added 2 commits January 27, 2025 11:43
* Fixes scipygh-22183.

* The input data from scipygh-22183 causes an out
of bounds memory access in a 1-D `memoryview`
in the Cython `cophenetic_distances` function.
Prevent this by enforcing a check for an allowable
upper bound on the cluster membership (4th column)
of the linkage matrix `Z` received by `is_valid_linkage`.

Co-authored-by: Warren Weckesser <[email protected]>
* Adjust `test_gh_22183()` such that the linkage matrix is invalid
only because of an excessive observation count, rather than
also being invalid because the large observation count is
not integral.

* Adjust `test_gh_22183()` such that is properly flushes
through array API backends.
Copy link
Member

@j-bowhay j-bowhay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM other than one style nit that I'll commit before merging

[skip ci]
@j-bowhay j-bowhay merged commit c1c3dfc into scipy:main Jan 27, 2025
@tylerjereddy tylerjereddy deleted the treddy_issue_22183 branch January 27, 2025 22:49
@tylerjereddy tylerjereddy added the backport-candidate This fix should be ported by a maintainer to previous SciPy versions. label Jan 27, 2025
@tylerjereddy tylerjereddy modified the milestones: 1.16.0, 1.15.2 Feb 5, 2025
tylerjereddy added a commit to tylerjereddy/scipy that referenced this pull request Feb 5, 2025
…y#22187)

* BUG: catch invalid linkage count

* Fixes scipygh-22183.

* The input data from scipygh-22183 causes an out
of bounds memory access in a 1-D `memoryview`
in the Cython `cophenetic_distances` function.
Prevent this by enforcing a check for an allowable
upper bound on the cluster membership (4th column)
of the linkage matrix `Z` received by `is_valid_linkage`.

Co-authored-by: Warren Weckesser <[email protected]>

* TST: PR 22187 revisions

* Adjust `test_gh_22183()` such that the linkage matrix is invalid
only because of an excessive observation count, rather than
also being invalid because the large observation count is
not integral.

* Adjust `test_gh_22183()` such that is properly flushes
through array API backends.

* DOC: backticks

---------

Co-authored-by: Warren Weckesser <[email protected]>
Co-authored-by: Jake Bowhay <[email protected]>
@tylerjereddy tylerjereddy mentioned this pull request Feb 5, 2025
7 tasks
tylerjereddy added a commit that referenced this pull request Feb 16, 2025
* BUG: immortalize uarray global strings in order to prevent negative refcounts (#22038)

#### Reference issue
Fixes #21214
Closes #21218

#### What does this implement/fix?
This PR addresses the segfaults caused by the occurrence of negative refcounts when uarray static strings were being released when the interpreter didn't exist.

---------

Co-authored-by: peterbell10 <[email protected]>

* BUG: special: Fix unchecked memory allocations in `specfun.h` (#22080)

* WIP: Fix unchecked memory allocation in aswfa()

Summary of the changes:

* Add another SF error code, "SF_ERROR_NOMEM", that is used by ufuncs
  that require internally allocated memory.  Just like the other possible
  errors, how this error is handled can be queried and set using `geterr()`
  and `seterr()`.  Unlike the other error conditions, the default behavior
  is to raise an exception if a memory allocation fails.

* Modify the function `aswfa()` in `specfun.h` to return a status code
  that indicates when a mempory allocation failed.

* Modify the functions `prolate_aswfa_nocv()`, `oblate_aswfa_nocv()`,
  `prolate_aswfa()` and `oblate_aswfa()`  in `sphd_wave.h` to check
  the return value of `aswfa()`, and set the error state and return values
  appropriately if `aswfa()` returns `NoMemory`.

* MAINT: Rename 'nomem' -> 'memory'.

* Fix all uses of unchecked allocations in specfun.h

* Fix aswfa() in specfunc.h

* Fix refguide check.

* Add comment about the return value when nonconvergence is detected in rmn2l.

* Change (c/m)alloc calls to use 'new' wrapped in a unique_ptr.

* Fix a few more names passed to set_error()

* Set outputs to nan in mtu12 when alloc fails
Also update docstring-like comments of some C++ functions.

* BUG: cluster: `cophenet` intercept invalid linkage matrix count (#22187)

* BUG: catch invalid linkage count

* Fixes gh-22183.

* The input data from gh-22183 causes an out
of bounds memory access in a 1-D `memoryview`
in the Cython `cophenetic_distances` function.
Prevent this by enforcing a check for an allowable
upper bound on the cluster membership (4th column)
of the linkage matrix `Z` received by `is_valid_linkage`.

Co-authored-by: Warren Weckesser <[email protected]>

* TST: PR 22187 revisions

* Adjust `test_gh_22183()` such that the linkage matrix is invalid
only because of an excessive observation count, rather than
also being invalid because the large observation count is
not integral.

* Adjust `test_gh_22183()` such that is properly flushes
through array API backends.

* DOC: backticks

---------

Co-authored-by: Warren Weckesser <[email protected]>
Co-authored-by: Jake Bowhay <[email protected]>

* MAINT: stats.Mixture: make return type consistent

* MAINT: stats.Mixture: fix inverse functions when mean is undefined (#22337)

* MAINT: stats.Mixture: fix inverse when mean is undefined

* Apply suggestions from code review

* Update scipy/stats/_distribution_infrastructure.py

* BUG: special: Fix unchecked malloc in stirling2.h (#22339)

Use std::unique_ptr and new (std::nothrow) instead of malloc.

Closes gh-22336.

* BUG: sparse: fix selecting wrong dtype for coo coords (#22353)

* Fixes bug with selecting wrong dtype for coo coords

* add test for reshape having wrong dtype

* test that smaller dtype is maintained even if intermediate values are big

---------

Co-authored-by: Dan Schult <[email protected]>

* TST: interpolate: small tolerance bump to TestAAA.test_basic_functions

* MAINT: Stop installing libhighs

Co-authored-by: drew-parsons <[email protected]>

Closes #22349

* BUG: sparse.linalg.norm: fix return type (#22372)

cupy reports cupy/cupy#8866 that
sparse.linalg.norm returns a python number instead of a 0-dim numpy array

* revert NotImplemented return values in dot & others (#22373)

* DOC: sparse.linalg: add two recent functions to namespace and fix doctests (#22374)

* fix doctests and make two recent functions visible

* update refguide

[docs only]

* fix capitalization   [docs only]

Co-authored-by: Daniel Schmitz <[email protected]>

---------

Co-authored-by: Daniel Schmitz <[email protected]>

* MAINT: stats.zmap: restore support for complex data (#22405)

* MAINT: stats.zmap: restore support for complex data

* MAINT: stats.zmap: fix masked array support

* Apply suggestions from code review

* BUG: scipy.spatial: Fix inaccurate orthonormalization in `Rotation.from_matrix()` (#22418)

* BUG: fix inaccurate orthonormalization in Rotation.from_matrix()

* Code review updates

---------

Co-authored-by: Scott Shambaugh <[email protected]>

* BUG: special: Fix a memory leak in the AMOS function besy(). (#22423)

Use a unique_ptr to manage the heap-allocated array.

Closes gh-22314.

* CI: build Linux aarch64 wheels on GitHub Actions instead of Cirrus CI

This unifies the wheels builds, which is a lot easier on maintenance.

In the process, did the following:
- update to setup-miniconda 3.1.1. for GHA Linux arm64 runner support
- add `conda-remove-defaults` to `setup-miniconda`

* CI: migrate Linux aarch64 job to GitHub Actions

* CI: remove Cirrus CI configuration [wheel build]

* BUG: io.loadmat: throw error on file containing all zeros (#22447)

[ci skip]

* BUG: PR 22471 revisions

* Fixes issues related to backport manual merge conflict resolution
during the preparation of SciPy `1.15.2`.

* Restore an accidentally-removed size-0 array check in
`_contains_nan()` function--this allows SciPy `stats` tests
to pass again on the maintenance branch.

* Remove some accidentally-added tests in `test_mio.py`

[skip circle]

* BUG: sparse: fix spmatrix indexing with None and implicit padding to match matrix behavior (#22472)

* fix index using None when index is padded with :

* align None indexing for spmatrix with np.matrix behavior. And tests.

* rewrite spmatrix handling to ease reading

* MAINT: pearsonr SIMD-related shim

* Fixes #22479.

* This small patch allows `test_stats.py::TestRegression::test_regressZEROX`
to pass on x86_64 Linux instead of failing due to an extra division by
zero warning over a fairly narrow range of NumPy versions near `1.25.2`
where SIMD implementation details appear to have been slightly
different.

[skip circle]

* DOC: PR 22471 revisions

* Update the SciPy `1.15.2` release notes following backport
activity.

* BUG: wrap median_filter stability (#22402)

* An uninitialized data structure in `_rank_filter()` C++ code
was causing stochastic algorithm failures on Linux but not MacOS,
so force a zero-based initialization. This allows `test_gh_22250`
to pass over thousands of `pytest-repeat` incantations.

* Add a regression test for gh-22333, which reliably passes
on this branch with `pytest-repeat` (and fails on `main` with
sufficient repeats).

* MAINT: integrate.cumulative_simpson: bump test tolerance

* do not check dtype in test_compare_with_GCVSPL

* CI: PR 22471 revisions

* temporarily disable testing against `meson` `master` branch
in CI because of gh-22534.

* DOC: PR 22471 revisions

* Update the SciPy `1.15.2` release notes following
additional backport activity.

* CI: PR 22471 wheel builds [wheel build]

* This is an empty commit to test the wheel build
matrix in PR 22471.

[wheel build]

---------

Co-authored-by: Edgar Andrés Margffoy Tuay <[email protected]>
Co-authored-by: peterbell10 <[email protected]>
Co-authored-by: Warren Weckesser <[email protected]>
Co-authored-by: Jake Bowhay <[email protected]>
Co-authored-by: Matt Haberland <[email protected]>
Co-authored-by: Parth Nobel <[email protected]>
Co-authored-by: Dan Schult <[email protected]>
Co-authored-by: Rohit Goswami <[email protected]>
Co-authored-by: Daniel Schmitz <[email protected]>
Co-authored-by: Scott Shambaugh <[email protected]>
Co-authored-by: Scott Shambaugh <[email protected]>
Co-authored-by: Ralf Gommers <[email protected]>
Co-authored-by: Matthew Brett <[email protected]>
Co-authored-by: Charles Bousseau <[email protected]>
@tylerjereddy tylerjereddy removed the backport-candidate This fix should be ported by a maintainer to previous SciPy versions. label Feb 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Cython Issues with the internal Cython code base defect A clear bug or issue that prevents SciPy from being installed or used as expected scipy.cluster
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: Segmentation Fault when passing a special array to scipy.cluster.hierarchy.cophenet
3 participants