SpearmanCorrCoef is very slow on large tensors with many duplicate elements #3102

gratus907 · 2025-05-23T02:29:43Z

🚀 Feature

More efficient implementation of SpearmanCorrCoef

Motivation

Current implementation of SpearmanCorrCoef is very slow on large tensors with many duplicate elements, due to inefficient implementation in _rank_data iterating through each elements.

Pitch

Improve implementation of _rank_data function as following

def _rank_data(data: Tensor) -> Tensor:
    n = data.numel()
    rank = torch.empty_like(data, dtype=torch.int32)
    idx = data.argsort()
    rank[idx[:n]] = torch.arange(1, n + 1, dtype=torch.int32, device=data.device)

    uniq, inv, counts = torch.unique(
        data, sorted=True, return_inverse=True, return_counts=True
    )
    sum_ranks = torch.zeros_like(uniq, dtype=torch.int32)
    sum_ranks.scatter_add_(0, inv, rank.to(torch.int32))
    mean_ranks = sum_ranks / counts
    return mean_ranks[inv]

which uses torch.unique and scatter_add_ to avoid python loops.

The text was updated successfully, but these errors were encountered:

github-actions · 2025-05-23T02:30:09Z

Hi! Thanks for your contribution! Great first issue!

gratus907 added the enhancement New feature or request label May 23, 2025

gratus907 linked a pull request May 23, 2025 that will close this issue

enhancement: improve performance of _rank_data #3103

Open

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

SpearmanCorrCoef is very slow on large tensors with many duplicate elements #3102

SpearmanCorrCoef is very slow on large tensors with many duplicate elements #3102

gratus907 commented May 23, 2025

github-actions bot commented May 23, 2025

Uh oh!

SpearmanCorrCoef is very slow on large tensors with many duplicate elements #3102

SpearmanCorrCoef is very slow on large tensors with many duplicate elements #3102

Comments

gratus907 commented May 23, 2025

🚀 Feature

Motivation

Pitch

github-actions bot commented May 23, 2025

Uh oh!