Skip to content

added multi armed bandit problem with three strategies to solve it #12668

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

sephml
Copy link

@sephml sephml commented Apr 11, 2025

Describe your change:

  • Add an algorithm?
  • Fix a bug or typo in an existing algorithm?
  • Add or change doctests? -- Note: Please avoid changing both code and tests in a single pull request.
  • Documentation change?

Checklist:

  • I have read CONTRIBUTING.md.
  • This pull request is all my own work -- I have not plagiarized.
  • I know that pull requests will not be merged if they fail the automated tests.
  • This PR only changes one algorithm file. To ease review, please open separate PRs for separate algorithms.
  • All new Python files are placed inside an existing directory.
  • All filenames are in all lowercase characters with no spaces or dashes.
  • All functions and variable names follow Python naming conventions.
  • All function parameters and return values are annotated with Python type hints.
  • All functions have doctests that pass the automated testing.
  • All new algorithms include at least one URL that points to Wikipedia or another similar explanation.
  • If this pull request resolves one or more open issues then the description above includes the issue number(s) with a closing keyword: "Fixes #ISSUE-NUMBER".

What is added?

Multi-armed bandits (MAB) represent a class of sequential decision-making problems, where an agent chooses from multiple actions (or "arms") with uncertain rewards, aiming to maximize cumulative reward through balancing exploration (gathering information about each arm) and exploitation (leveraging known rewarding arms). It's one of the foundational algorithms in reinforcement learning and optimization contexts, as it models fundamental exploration-exploitation trade-offs that underpin decision-making processes. MAB algorithms, such as the epsilon-greedy, Upper Confidence Bound (UCB), and Thompson Sampling, find widespread applications across recommendation systems, adaptive clinical trials, online advertising, and resource allocation, effectively optimizing real-world decisions under uncertainty with minimal data collection.

@algorithms-keeper algorithms-keeper bot added the tests are failing Do not merge until tests pass label Apr 11, 2025
@algorithms-keeper algorithms-keeper bot removed tests are failing Do not merge until tests pass labels Apr 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant