Skip to content

Commit 052361d

Browse files
committed
add training code
1 parent 9b969ca commit 052361d

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

51 files changed

+7760
-171
lines changed

research/efficient-hrl/README.md

+42-8
Original file line numberDiff line numberDiff line change
@@ -1,28 +1,62 @@
1-
Code for performing Hierarchical RL based on
1+
Code for performing Hierarchical RL based on the following publications:
2+
23
"Data-Efficient Hierarchical Reinforcement Learning" by
34
Ofir Nachum, Shixiang (Shane) Gu, Honglak Lee, and Sergey Levine
45
(https://arxiv.org/abs/1805.08296).
56

6-
7-
This library currently includes three of the environments used:
8-
Ant Maze, Ant Push, and Ant Fall.
9-
10-
The training code is planned to be open-sourced at a later time.
7+
"Near-Optimal Representation Learning for Hierarchical Reinforcement Learning"
8+
by Ofir Nachum, Shixiang (Shane) Gu, Honglak Lee, and Sergey Levine
9+
(https://arxiv.org/abs/1810.01257).
1110

1211

1312
Requirements:
1413
* TensorFlow (see http://www.tensorflow.org for how to install/upgrade)
14+
* Gin Config (see https://github.com/google/gin-config)
15+
* Tensorflow Agents (see https://github.com/tensorflow/agents)
1516
* OpenAI Gym (see http://gym.openai.com/docs, be sure to install MuJoCo as well)
1617
* NumPy (see http://www.numpy.org/)
1718

1819

1920
Quick Start:
2021

21-
Run a random policy on AntMaze (or AntPush, AntFall):
22+
Run a training job based on the original HIRO paper on Ant Maze:
23+
24+
```
25+
python scripts/local_train.py test1 hiro_orig ant_maze base_uvf suite
26+
```
27+
28+
Run a continuous evaluation job for that experiment:
2229

2330
```
24-
python environments/__init__.py --env=AntMaze
31+
python scripts/local_eval.py test1 hiro_orig ant_maze base_uvf suite
2532
```
2633

34+
To run the same experiment with online representation learning (the
35+
"Near-Optimal" paper), change `hiro_orig` to `hiro_repr`.
36+
You can also run with `hiro_xy` to run the same experiment with HIRO on only the
37+
xy coordinates of the agent.
38+
39+
To run on other environments, change `ant_maze` to something else; e.g.,
40+
`ant_push_multi`, `ant_fall_multi`, etc. See `context/configs/*` for other options.
41+
42+
43+
Basic Code Guide:
44+
45+
The code for training resides in train.py. The code trains a lower-level policy
46+
(a UVF agent in the code) and a higher-level policy (a MetaAgent in the code)
47+
concurrently. The higher-level policy communicates goals to the lower-level
48+
policy. In the code, this is called a context. Not only does the lower-level
49+
policy act with respect to a context (a higher-level specified goal), but the
50+
higher-level policy also acts with respect to an environment-specified context
51+
(corresponding to the navigation target location associated with the task).
52+
Therefore, in `context/configs/*` you will find both specifications for task setup
53+
as well as goal configurations. Most remaining hyperparameters used for
54+
training/evaluation may be found in `configs/*`.
55+
56+
NOTE: Not all the code corresponding to the "Near-Optimal" paper is included.
57+
Namely, changes to low-level policy training proposed in the paper (discounting
58+
and auxiliary rewards) are not implemented here. Performance should not change
59+
significantly.
60+
2761

2862
Maintained by Ofir Nachum (ofirnachum).

0 commit comments

Comments
 (0)