|
1 |
| -Code for performing Hierarchical RL based on |
| 1 | +Code for performing Hierarchical RL based on the following publications: |
| 2 | + |
2 | 3 | "Data-Efficient Hierarchical Reinforcement Learning" by
|
3 | 4 | Ofir Nachum, Shixiang (Shane) Gu, Honglak Lee, and Sergey Levine
|
4 | 5 | (https://arxiv.org/abs/1805.08296).
|
5 | 6 |
|
6 |
| - |
7 |
| -This library currently includes three of the environments used: |
8 |
| -Ant Maze, Ant Push, and Ant Fall. |
9 |
| - |
10 |
| -The training code is planned to be open-sourced at a later time. |
| 7 | +"Near-Optimal Representation Learning for Hierarchical Reinforcement Learning" |
| 8 | +by Ofir Nachum, Shixiang (Shane) Gu, Honglak Lee, and Sergey Levine |
| 9 | +(https://arxiv.org/abs/1810.01257). |
11 | 10 |
|
12 | 11 |
|
13 | 12 | Requirements:
|
14 | 13 | * TensorFlow (see http://www.tensorflow.org for how to install/upgrade)
|
| 14 | +* Gin Config (see https://github.com/google/gin-config) |
| 15 | +* Tensorflow Agents (see https://github.com/tensorflow/agents) |
15 | 16 | * OpenAI Gym (see http://gym.openai.com/docs, be sure to install MuJoCo as well)
|
16 | 17 | * NumPy (see http://www.numpy.org/)
|
17 | 18 |
|
18 | 19 |
|
19 | 20 | Quick Start:
|
20 | 21 |
|
21 |
| -Run a random policy on AntMaze (or AntPush, AntFall): |
| 22 | +Run a training job based on the original HIRO paper on Ant Maze: |
| 23 | + |
| 24 | +``` |
| 25 | +python scripts/local_train.py test1 hiro_orig ant_maze base_uvf suite |
| 26 | +``` |
| 27 | + |
| 28 | +Run a continuous evaluation job for that experiment: |
22 | 29 |
|
23 | 30 | ```
|
24 |
| -python environments/__init__.py --env=AntMaze |
| 31 | +python scripts/local_eval.py test1 hiro_orig ant_maze base_uvf suite |
25 | 32 | ```
|
26 | 33 |
|
| 34 | +To run the same experiment with online representation learning (the |
| 35 | +"Near-Optimal" paper), change `hiro_orig` to `hiro_repr`. |
| 36 | +You can also run with `hiro_xy` to run the same experiment with HIRO on only the |
| 37 | +xy coordinates of the agent. |
| 38 | + |
| 39 | +To run on other environments, change `ant_maze` to something else; e.g., |
| 40 | +`ant_push_multi`, `ant_fall_multi`, etc. See `context/configs/*` for other options. |
| 41 | + |
| 42 | + |
| 43 | +Basic Code Guide: |
| 44 | + |
| 45 | +The code for training resides in train.py. The code trains a lower-level policy |
| 46 | +(a UVF agent in the code) and a higher-level policy (a MetaAgent in the code) |
| 47 | +concurrently. The higher-level policy communicates goals to the lower-level |
| 48 | +policy. In the code, this is called a context. Not only does the lower-level |
| 49 | +policy act with respect to a context (a higher-level specified goal), but the |
| 50 | +higher-level policy also acts with respect to an environment-specified context |
| 51 | +(corresponding to the navigation target location associated with the task). |
| 52 | +Therefore, in `context/configs/*` you will find both specifications for task setup |
| 53 | +as well as goal configurations. Most remaining hyperparameters used for |
| 54 | +training/evaluation may be found in `configs/*`. |
| 55 | + |
| 56 | +NOTE: Not all the code corresponding to the "Near-Optimal" paper is included. |
| 57 | +Namely, changes to low-level policy training proposed in the paper (discounting |
| 58 | +and auxiliary rewards) are not implemented here. Performance should not change |
| 59 | +significantly. |
| 60 | + |
27 | 61 |
|
28 | 62 | Maintained by Ofir Nachum (ofirnachum).
|
0 commit comments