Skip to content

GH-81620: Add random.binomialvariate() #94719

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 19 commits into from
Jul 13, 2022
Merged

Conversation

rhettinger
Copy link
Contributor

@rhettinger rhettinger commented Jul 9, 2022

Here are some comparisons between actual and expected histograms of the random variable across various input parameters chosen to cover all the code paths:

raymond@raymonds-mbp cpython % ./python.exe show_dist2.py
n=0	p=0.5	times=131072
[131072] actual
[131072] expected

n=1	p=0.5	times=131072
[65449, 65623] actual
[65536, 65536] expected

n=1	p=0.1	times=131072
[118042, 13030] actual
[117965, 13107] expected

n=1	p=0.9	times=131072
[13093, 117979] actual
[13107, 117965] expected

n=1	p=0.0	times=131072
[131072, 0] actual
[131072, 0] expected

n=1	p=1.0	times=131072
[0, 131072] actual
[0, 131072] expected

n=2	p=0.5	times=131072
[32623, 65655, 32794] actual
[32768, 65536, 32768] expected

n=3	p=0.5	times=131072
[16265, 49238, 49219, 16350] actual
[16384, 49152, 49152, 16384] expected

n=3	p=0.5	times=131072
[16401, 49226, 48791, 16654] actual
[16384, 49152, 49152, 16384] expected

n=4	p=0.5	times=131072
[8182, 32534, 49367, 32803, 8186] actual
[8192, 32768, 49152, 32768, 8192] expected

n=2	p=0.25	times=131072
[73709, 49328, 8035] actual
[73728, 49152, 8192] expected

n=3	p=0.25	times=131072
[55206, 55418, 18407, 2041] actual
[55296, 55296, 18432, 2048] expected

n=3	p=0.25	times=131072
[55739, 54719, 18568, 2046] actual
[55296, 55296, 18432, 2048] expected

n=4	p=0.25	times=131072
[41531, 55530, 27212, 6296, 503] actual
[41472, 55296, 27648, 6144, 512] expected

n=2	p=0.75	times=131072
[8104, 49441, 73527] actual
[8192, 49152, 73728] expected

n=3	p=0.75	times=131072
[2137, 18494, 55245, 55196] actual
[2048, 18432, 55296, 55296] expected

n=3	p=0.75	times=131072
[2125, 18426, 55237, 55284] actual
[2048, 18432, 55296, 55296] expected

n=4	p=0.75	times=131072
[501, 6189, 27652, 55351, 41379] actual
[512, 6144, 27648, 55296, 41472] expected

n=21	p=0.5	times=8388608
[3, 83, 861, 5459, 23954, 81369, 216626, 464857, 814107, 1175105, 1410003, 1412399, 1175408, 813977, 465240, 217611, 81517, 23835, 5267, 849, 75, 3] actual
[4, 84, 840, 5320, 23940, 81396, 217056, 465120, 813960, 1175720, 1410864, 1410864, 1175720, 813960, 465120, 217056, 81396, 23940, 5320, 840, 84, 4] expected

n=23	p=0.44	times=33554432
[50, 996, 8357, 46376, 182566, 546412, 1288350, 2457192, 3860402, 5054928, 5561908, 5163281, 4055424, 2698094, 1514653, 714189, 280545, 91110, 23760, 4920, 827, 85, 7, 0] actual
[54, 980, 8467, 46568, 182945, 546223, 1287525, 2456809, 3860700, 5055678, 5561246, 5164014, 4057440, 2697528, 1513919, 713705, 280384, 90712, 23758, 4912, 772, 87, 6, 0] expected

n=23	p=0.56	times=33554432
[1, 12, 81, 804, 4820, 23672, 90753, 279718, 713524, 1514626, 2698389, 4061304, 5162182, 5560620, 5056002, 3861575, 2455442, 1287630, 544792, 182352, 46622, 8448, 1002, 61] actual
[0, 6, 87, 772, 4912, 23758, 90712, 280384, 713705, 1513919, 2697528, 4057440, 5164014, 5561246, 5055678, 3860700, 2456809, 1287525, 546223, 182945, 46568, 8467, 980, 54] expected

The above output was created using this test function:

from random import binomialvariate
from collections import Counter
from math import comb

def compare(n=1, p=0.5, times=2**17):
    c = Counter(binomialvariate(n, p) for i in range(times))
    assert c.total() == times
    expected = []
    actual = []
    for r in range(n + 1):
        exp = round(times * comb(n, r) * p ** r * (1 - p) ** (n - r))
        act = c.pop(r, 0)
        expected.append(exp)
        actual.append(act)
    assert not c
    print(f'{n=}\t{p=}\t{times=}')
    print(actual, 'actual')
    print(expected, 'expected')
    print()

This code is useful for instrumenting how many calls to random() are made:

from random import _inst, binomialvariate as B
from unittest.mock import patch

def run(n=1, p=0.5):
    with patch.object(_inst, 'random', side_effect=_inst.random) as mo:
        b = B(n, p)
        print(f"B({n}, {p})={b}   {mo.call_count=}")

This demonstrates that how few loops are needed:

# Fast path
B(1, 0.5)=1   mo.call_count=1

# BG for n*p < 10
B(10, 0.5)=6   mo.call_count=6
B(10, 0.25)=2   mo.call_count=3
B(10, 0.01)=0   mo.call_count=1
B(10, 0.75)=8   mo.call_count=3
B(10, 0.99)=10   mo.call_count=1

# BTRS for n*p >= 10
B(100000, 0.5)=49853   mo.call_count=4
B(100000, 0.001)=102   mo.call_count=2
B(100000, 0.999)=99909   mo.call_count=2

@rhettinger rhettinger added type-feature A feature request or enhancement stdlib Python modules in the Lib dir labels Jul 9, 2022
@rhettinger rhettinger requested review from mdickinson and tim-one July 9, 2022 20:26
@rhettinger rhettinger self-assigned this Jul 9, 2022
@rhettinger rhettinger merged commit ed06ec1 into python:main Jul 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stdlib Python modules in the Lib dir type-feature A feature request or enhancement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants