Skip to content

Fixed epsilon decay in dqn example #117

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Dec 4, 2021
Merged

Conversation

carsondenison
Copy link
Contributor

@carsondenison carsondenison commented Dec 3, 2021

Before submitting

  • I raised this issue on github, issue Found mistake in pytorch-lightning DQN example. How do I upload a fix? pytorch-lightning#10883. The user @awaelchli said to fork the repo, clone it to my local machine, make my changes, push to my repo, and then submit a pull request.
  • This is my first attempt to contribute to an open-source project, so please let me know if I've done anything incorrectly.
  • I could not find where the unit tests are kept for the example colab notebooks. I would be happy to upload my unit tests for the get_epsilon method, but I don't know where to put them in the codebase.

What does this PR do?

This fixes issue Lightning-AI/pytorch-lightning#10883, where there was an incorrect version of epsilon decay for the epsilon-random policy of a DQN. The original code has a single train_step with self.hparams.eps_start and then immediately switches to self.hparams.eps_end. The intended behavior is to linearly decrease epsilon from self.haparms.eps_start to self.hparams.eps_end over the first self.hparams.eps_last_frame steps. I wrote a small function get_epsilon which fixes this logic and returns the correct epsilon.

I have also made a few minor changes on other lines, because the code would not run on my local machine without these changes. Specifically, the type hint for the __iter__ method of the RLDataset class was a Tuple, and should be an Iterator[Tuple], because it returns a generator of tuples representing (state, action, reward, done, new_state). Additionally, on line 264 (formerly 276), I got an error that the index for the gather() function must be of type int64, so I cast the actions tensor to type long. Finally, I added logging of self.episode_reward and epsilon, so that I could see that a) it still learned successfully, and b) my intended changes to epsilon were working as intended.

PR review

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

Did you have fun?

Sure did!

@Borda Borda added the bug / fix Something isn't working label Dec 3, 2021
@Borda Borda enabled auto-merge (squash) December 4, 2021 23:13
@Borda Borda merged commit b3762a2 into Lightning-AI:main Dec 4, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug / fix Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants