Add Minerva algorithm to Arlo

nealmcb · nealmcb · commit b97f650c5d1c · 2020-11-10T15:45:37.000-07:00
feat: add athena_sample_sizes option Add an athena_sample_sizes() shim, to call Athena using the same call signature as bravo_sample_sizes() Calculate Athena p-values Define get_athena_test_statistics() shim patterned after bravo.get_test_statistics() Still following earlier integration work based on ab525ad from 2020-05-05 Move shim to new audit_math location Based on Arlo changes in the meantime. Dirty patch to bravo.py for using athena Allow user to choose audit_math implementation at startup. The default is now to use Athena, but the $ARLO_ALGORITHM environmental variable can be used to select an algorithm at run time. E.g.: ARLO_ALGORITHM=bravo ./run-dev.sh N.B.: to avoid have to add too much other logic before we decide on the best way to integrate this, we simply conditionally replace bravo.bravo_sample_sizes with athena_sample_sizes: if ALGORITHM == "athena": bravo_sample_sizes = athena_sample_sizes Also catch up with API updates in athena repo. Switch to minerva, adapt to Athena api changes FIXME: bravo tests still broken Need to check the minerva test results. Temp README changes; logging setup; more tests Temp: run black, note issues Resolve lint, typing errors; fstring logging Turned off logging-fstring-interpolation in .pylintrc. I think the possible tiny performance penalty is offset by the readability gains, as noted at pylint-dev/pylint#2354 (comment) Clarify README and auth0.md; fix .pylintrc format Add athena from git to Pipfile Note changes in Pipfile.lock - not sure if you want the rest of the packages to be updated, or to have specific version numbers. Add logging during startup Fix typo, get node, cli buildpacks to build Specify Heroku version python-3.7.8 in runtime.txt Fix Pipfile.lock syntax Clean up some logging
diff --git a/.pylintrc b/.pylintrc
@@ -163,6 +163,7 @@ disable=print-statement,
         duplicate-code,
         broad-except,
         no-else-raise,
+        logging-fstring-interpolation,
 
 
 
diff --git a/Pipfile b/Pipfile
@@ -40,6 +40,7 @@ sqlalchemy = "*"
 typing-extensions = "*"
 pytest-testmon = "*"
 sentry-sdk = {extras = ["flask"], version = "*"}
+athena = {editable = true,git = "https://github.com/filipzz/athena.git"}
 
 [requires]
 python_version = "3.8"
diff --git a/Pipfile.lock b/Pipfile.lock
diff --git a/README.md b/README.md
@@ -109,11 +109,17 @@ Rather than manually config the environment, you can also run the setup script d
 
 ### Creating Organizations and Administrators
 
-Organizations are, for example, the State of
+Arlo identifies and authenticates three classes of end-user:
+organization administrators, jurisdiction adminstrators,
+and audit boards.
+
+Organizations identify administrators and jurisdictions for whom they
+administrate audits. Jurisdictions identify their own administrators,
+as well as audit boards. Audit boards enter ballot-by-ballot auditing data.
+
+Thus, organizations are, for example, the State of
 Massachusetts. Administrators are individual users that administer
-audits for an organization. All authentication is done via auth0 with
-email addresses, so users in the Arlo database also need to be
-mirrored in the appropriate auth0 tenant user database.
+audits for an organization.
 
 To create an organization in the database:
 
@@ -127,6 +133,26 @@ Then, to create an administrator for the organization:
 
 which returns the `user_id`.
 
+This can be be automated via:
+
+    org=$(python -m scripts.create-org MyOrg)
+    python -m scripts.create-admin $org my_administrator@example.org
+
+The email addresses authorized to administer jurisdictions are identified
+via the `filesheet.csv` file uploaded by the organization administrator.
+
+After the jurisdiction admin creates audit boards for each audit,
+they download the `Audit Board Credentials for Data Entry.pdf` files.
+Each one contains a URL for an audit board, with an embedded
+authentication token.
+
+All authentication is done using OAuth 2.0
+(e.g. via [Auth0](https://auth0.com/), with
+email addresses, so users in the Arlo database also should typically be
+configured in the appropriate auth0 tenant user database.
+
+For design details, see [Arlo's use of Auth0](docs/auth0.md).
+
 ### Resetting the Database When Upgrading Arlo
 
 If you're upgrading Arlo, right now the only way is to destroy and
@@ -155,10 +181,11 @@ We recommend Ubuntu 18.0.4.
 
 #### Automatic configuration and setup
 
-If you would just like to run Arlo and do not wish to setup a custom configuration, you can run `pipenv run python -m scripts.setup-dev`, which provides interactive configuration. The script optionally installs VotingWorks' [nOAuth](https://github.com/votingworks/nOAuth) locally, runs it, and configures Arlo to use it. It creates the necessary audit administrator and jurisdiction administrator credentials discussed above, and launches a dev instance of Arlo. Once you have navigated to `localhost:3000` in your broswer, you should be able to log in as an audit admin using the credentials you configured earlier in the script. 
+If you would just like to run Arlo and do not wish to setup a custom configuration, you can run `pipenv run python -m scripts.setup-dev`, which provides interactive configuration. The script optionally installs VotingWorks' [nOAuth](https://github.com/votingworks/nOAuth) locally, runs it, and configures Arlo to use it. It creates the necessary audit administrator and jurisdiction administrator credentials discussed above, and launches a dev instance of Arlo. Once you have navigated to `localhost:3000` in your broswer, you should be able to log in as an audit admin using the credentials you configured earlier in the script.
 
 #### Troubleshooting
 
+- Beware: if you run make format-server, it will run black in a way which changes all files under the current directory without providing a backup
 - Postgres is best installed by grabbing `postgresql-server-dev-10` and `postgresql-client-10`.
 - `psychopg2` has known issues depending on your install (see, e.g., [here](https://github.com/psycopg/psycopg2/issues/674)). If you run into issues, switch `psychopg2` to `psychopg2-binary` in the Pipfile
 - `pipenv install` can hang attempting to get [a lock on the packages it's installing](https://github.com/pypa/pipenv/issues/3827). To get around this, add the `--skip-lock` flag in the Makefile (the first line should be `pipenv install --skip-lock`).
diff --git a/docs/auth0.md b/docs/auth0.md
@@ -12,13 +12,13 @@ Auth0 is used for authentication. Key things to keep in mind:
 - we use two separate Auth0 tenants, one for audit administrators, one
   for jurisdiction administrators, each with its own single
   application, so we can use completely different login screens for
-  both, specifically 2FA for audit administrators and passwordless for
-  jurisdiction administrators.
+  both, specifically 2FA for audit administrators, and both 2FA
+  and passwordless for jurisdiction administrators / audit boards.
 
 - setting up auth0 passwordless requires either creating users via the
   Management API, or letting anyone sign in and filtering on our
-  end. We'll start with the latter, we may do the former at some
-  point.
+  end. We'll start with the latter, creating URLs for audit boards on
+  the fly, but we may do the former at some point.
 
 - right now we're using "Universal Login", where Auth0 controls the
   login page. It's not clear that's the right way forward for Arlo, as
diff --git a/runtime.txt b/runtime.txt
@@ -0,0 +1 @@
+python-3.7.8
diff --git a/server/audit_math/bravo.py b/server/audit_math/bravo.py
@@ -9,10 +9,14 @@
 import math
 from decimal import Decimal, ROUND_CEILING
 from collections import defaultdict
+import logging
 from typing import Dict, Tuple, Optional
 from scipy import stats
 
 from .sampler_contest import Contest
+from .shim import minerva_sample_sizes, get_minerva_test_statistics  # type: ignore
+
+from ..config import ALGORITHM
 
 
 def get_expected_sample_sizes(
@@ -122,6 +126,27 @@ def get_test_statistics(
                     Decimal((1 - winners[winner]["swl"][cand]) / 0.5) ** votes
                 )
 
+    logging.debug(f"bravo test_stats: T={T}")
+
+    if ALGORITHM == "minerva":
+        for winner, winner_res in winners.items():
+            for loser, loser_res in losers.items():
+                res = get_minerva_test_statistics(
+                    0.1,
+                    winner_res["p_w"],
+                    loser_res["p_l"],
+                    sample_results[winner],
+                    sample_results[loser],
+                )
+                logging.debug(
+                    f"minerva test_stats {res=} for: {winner_res['p_w']=}, {loser_res['p_l']=}, {sample_results[winner]=}, {sample_results[loser]=})"
+                )
+                T[(winner, loser)] = 1.0 if res is None else 1.0 / res
+
+        logging.debug(f"minerva test_stats return: T={T}")
+        return T
+
+    # else.....
     return T
 
 
@@ -469,4 +494,10 @@ def compute_risk(
 
         if raw > alpha:
             finished = False
+    logging.debug(f"samples {sample_results}, measurements {measurements}")
     return measurements, finished
+
+
+# Quick-and-dirty way to switch between auditing algorithms: override the function
+if ALGORITHM == "minerva":
+    bravo_sample_sizes = minerva_sample_sizes
diff --git a/server/audit_math/shim.py b/server/audit_math/shim.py
@@ -0,0 +1,195 @@
+"""shim.py: Shim code to interface between the calling conventions expected by the current
+bravo_sample_sizes() code with the API currently provided by the athena module.
+
+Over time we expect both the Arlo and the Athena calling conventions to change, so
+this is a very temporary solution.
+
+TODO: Is is worth finding a way to keep the Audit objects cached?
+Or is it better to make them up for each pairwise estimate as we go?
+"""
+
+import logging
+import math
+from typing import Any
+from athena.audit import Audit  # type: ignore
+
+
+def make_election(risk_limit, p_w: float, p_r: float) -> Any:
+    """
+    Transform fractional shares to an athena Election object.
+
+    Inputs:
+        risk_limit      - the risk-limit for this audit
+        p_w             - the fraction of vote share for the winner
+        p_r             - the fraction of vote share for the loser / runner-up
+    """
+
+    # calculate the undiluted "two-way" share of votes for the winner
+    p_wr = p_w + p_r
+    p_w2 = p_w / p_wr
+
+    contest_ballots = 100000
+    winner = int(contest_ballots * p_w2)
+    loser = contest_ballots - winner
+
+    contest = {
+        "contest_ballots": contest_ballots,
+        "tally": {"A": winner, "LOSER": loser},
+        "num_winners": 1,
+        "reported_winners": ["A"],
+        "contest_type": "PLURALITY",
+    }
+
+    contest_name = "ArloContest"
+    election = {
+        "name": "ArloElection",
+        "total_ballots": contest_ballots,
+        "contests": {contest_name: contest},
+    }
+
+    audit = Audit("minerva", risk_limit)
+    audit.add_election(election)
+    audit.load_contest(contest_name)
+
+    return audit
+
+
+def get_minerva_test_statistics(
+    risk_limit: float, p_w: float, p_r: float, sample_w: int, sample_r: int,
+) -> Any:
+    """
+    Return Minerva p-value
+    TODO: refactor to pass in integer vote shares to allow more exact calculations, incorporate or
+    track round schedule over time, and handle sampling without replacement.
+
+    Inputs:
+        risk_limit      - the risk-limit for this audit
+        p_w             - the fraction of vote share for the winner
+        p_r             - the fraction of vote share for the loser
+        sample_w        - the number of votes for the winner that have already
+                          been sampled
+        sample_r        - the number of votes for the runner-up that have
+                          already been sampled
+
+    Outputs:
+        p_value        - p-value for given circumstances
+
+    FIXME: need new Minerva-specific test cases - are these exactly right?
+    Vs Athena Test cases from https://github.com/gwexploratoryaudits/brla_explore/pull/10/files/988f068e65fd955c8e5d1512865ef5e95a1d7b3c..94693c67aa33a1c642a98336ca5b7fcd32c1ce33#
+    test26: pass
+    >>> get_minerva_test_statistics(0.1, 0.224472184613, 0.12237580158, 50, 36)
+    0.08762086910131112
+
+    test27: fail
+    >>> get_minerva_test_statistics(0.1, 0.224472184613, 0.12237580158, 49, 37)
+    0.12450655512929908
+
+    FIXME: Should this be 1.0?  Or nothing, indicaating "None"?
+    >>> get_minerva_test_statistics(0.1, 0.224472184613, 0.12237580158, 0, 0)
+    >>> get_minerva_test_statistics(0.1, 0.75, 0.25, 7, 0)
+    0.05852766346593508
+    """
+
+    # calculate the undiluted "two-way" share of votes for the winner
+    p_wr = p_w + p_r
+    p_w2 = p_w / p_wr
+
+    audit = make_election(risk_limit, p_w, p_r)
+
+    if sample_w or sample_r:
+        round_sizes = [sample_w + sample_r]
+        audit.add_round_schedule(round_sizes)
+        audit.set_observations(round_sizes[0], round_sizes[0], [sample_w, sample_r])
+    else:
+        round_sizes = []
+
+    if round_sizes:
+        status = audit.status[audit.active_contest]
+        risk = status.risks[0]
+    else:
+        risk = None
+
+    logging.info(
+        f"shim get_minerva_test_statistics: margin {(p_w2 - 0.5) * 2} (pw {p_w} pr {p_r}) (sw {sample_w} sr {sample_w}) risk {risk}"
+    )
+
+    return risk
+
+
+def minerva_sample_sizes(
+    risk_limit: float,
+    p_w: float,
+    p_r: float,
+    sample_w: int,
+    sample_r: int,
+    p_completion: float,
+) -> int:
+    """
+    Return Minerva round size based on completion probability, assuming the election outcome is correct.
+    TODO: refactor to pass in integer vote shares to allow more exact calculations, incorporate or
+    track round schedule over time, and handle sampling without replacement.
+
+    Inputs:
+        risk_limit      - the risk-limit for this audit
+        p_w             - the fraction of vote share for the winner
+        p_r             - the fraction of vote share for the loser
+        sample_w        - the number of votes for the winner that have already
+                          been sampled
+        sample_r        - the number of votes for the runner-up that have
+                          already been sampled
+        p_completion    - the desired chance of completion in one round,
+                          if the outcome is correct
+
+    Outputs:
+        sample_size     - the expected sample size for the given chance
+                          of completion in one round
+
+    >>> minerva_sample_sizes(0.1, 0.6, 0.4, 56, 56, 0.7)
+    244
+
+    # FIXME: check this
+    >>> minerva_sample_sizes(0.1, 0.6, 0.4, 0, 0, 0.7)
+    111
+    >>> minerva_sample_sizes(0.1, 0.6, 0.4, 0, 0, 0.9)
+    179
+    """
+
+    # calculate the undiluted "two-way" share of votes for the winner
+    p_wr = p_w + p_r
+    p_w2 = p_w / p_wr
+
+    audit = make_election(risk_limit, p_w, p_r)
+
+    pstop_goal = [p_completion]
+
+    if sample_w or sample_r:
+        round_sizes = [sample_w + sample_r]
+        audit.add_round_schedule(round_sizes)
+        audit.set_observations(round_sizes[0], round_sizes[0], [sample_w, sample_r])
+    else:
+        round_sizes = []
+
+    if round_sizes:
+        status = audit.status[audit.active_contest]
+        below_kmin = status.min_kmins[0] - sample_w
+    else:
+        below_kmin = 0
+
+    res = audit.find_next_round_size(pstop_goal)
+    next_round_size_0 = res["future_round_sizes"][0]
+
+    next_round_size = next_round_size_0 + 2 * below_kmin
+
+    size_adj = math.ceil(next_round_size / p_wr)
+
+    logging.info(
+        f"shim sample sizes: margin {(p_w2 - 0.5) * 2} (pw {p_w} pr {p_r}) (sw {sample_w} sr {sample_r}) pstop {p_completion} below_kmin {below_kmin} raw {next_round_size} scaled {size_adj}"
+    )
+
+    return size_adj
+
+
+if __name__ == "__main__":
+    import doctest
+
+    doctest.testmod()
diff --git a/server/config.py b/server/config.py