You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/Training-Imitation-Learning.md
+50-23Lines changed: 50 additions & 23 deletions
Original file line number
Diff line number
Diff line change
@@ -12,17 +12,31 @@ from a demonstration to learn a policy. [Video Link](https://youtu.be/kpb8ZkMBFY
12
12
13
13
## Recording Demonstrations
14
14
15
-
It is possible to record demonstrations of agent behavior from the Unity Editor, and save them as assets. These demonstrations contain information on the observations, actions, and rewards for a given agent during the recording session. They can be managed from the Editor, as well as used for training with Offline Behavioral Cloning (see below).
15
+
It is possible to record demonstrations of agent behavior from the Unity Editor,
16
+
and save them as assets. These demonstrations contain information on the
17
+
observations, actions, and rewards for a given agent during the recording session.
18
+
They can be managed from the Editor, as well as used for training with Offline
19
+
Behavioral Cloning (see below).
16
20
17
-
In order to record demonstrations from an agent, add the `Demonstration Recorder` component to a GameObject in the scene which contains an `Agent` component. Once added, it is possible to name the demonstration that will be recorded from the agent.
21
+
In order to record demonstrations from an agent, add the `Demonstration Recorder`
22
+
component to a GameObject in the scene which contains an `Agent` component.
23
+
Once added, it is possible to name the demonstration that will be recorded
24
+
from the agent.
18
25
19
26
<palign="center">
20
27
<img src="images/demo_component.png"
21
28
alt="BC Teacher Helper"
22
29
width="375" border="10" />
23
30
</p>
24
31
25
-
When `Record` is checked, a demonstration will be created whenever the scene is played from the Editor. Depending on the complexity of the task, anywhere from a few minutes or a few hours of demonstration data may be necessary to be useful for imitation learning. When you have recorded enough data, end the Editor play session, and a `.demo` file will be created in the `Assets/Demonstrations` folder. This file contains the demonstrations. Clicking on the file will provide metadata about the demonstration in the inspector.
32
+
When `Record` is checked, a demonstration will be created whenever the scene
33
+
is played from the Editor. Depending on the complexity of the task, anywhere
34
+
from a few minutes or a few hours of demonstration data may be necessary to
35
+
be useful for imitation learning. When you have recorded enough data, end
36
+
the Editor play session, and a `.demo` file will be created in the
37
+
`Assets/Demonstrations` folder. This file contains the demonstrations.
38
+
Clicking on the file will provide metadata about the demonstration in the
39
+
inspector.
26
40
27
41
<palign="center">
28
42
<img src="images/demo_inspector.png"
@@ -33,29 +47,42 @@ When `Record` is checked, a demonstration will be created whenever the scene is
33
47
34
48
## Training with Behavioral Cloning
35
49
36
-
There are a variety of possible imitation learning algorithms which can be used,
37
-
the simplest one of them is Behavioral Cloning. It works by collecting demonstrations from a teacher, and then simply uses them to directly learn a policy, in the
38
-
same way the supervised learning for image classification or other traditional
39
-
Machine Learning tasks work.
50
+
There are a variety of possible imitation learning algorithms which can
51
+
be used, the simplest one of them is Behavioral Cloning. It works by collecting
52
+
demonstrations from a teacher, and then simply uses them to directly learn a
53
+
policy, in the same way the supervised learning for image classification
54
+
or other traditional Machine Learning tasks work.
40
55
41
56
42
57
### Offline Training
43
58
44
-
With offline behavioral cloning, we can use demonstrations (`.demo` files) generated using the `Demonstration Recorder` as the dataset used to train a behavior.
59
+
With offline behavioral cloning, we can use demonstrations (`.demo` files)
60
+
generated using the `Demonstration Recorder` as the dataset used to train a behavior.
45
61
46
62
1. Choose an agent you would like to learn to imitate some set of demonstrations.
47
-
2. Record a set of demonstration using the `Demonstration Recorder` (see above). For illustrative purposes we will refer to this file as `AgentRecording.demo`.
48
-
3. Build the scene, assigning the agent a Learning Brain, and set the Brain to Control in the Broadcast Hub. For more information on Brains, see [here](Learning-Environment-Design-Brains.md).
63
+
2. Record a set of demonstration using the `Demonstration Recorder` (see above).
64
+
For illustrative purposes we will refer to this file as `AgentRecording.demo`.
65
+
3. Build the scene, assigning the agent a Learning Brain, and set the Brain to
66
+
Control in the Broadcast Hub. For more information on Brains, see
67
+
[here](Learning-Environment-Design-Brains.md).
49
68
4. Open the `config/offline_bc_config.yaml` file.
50
-
5. Modify the `demo_path` parameter in the file to reference the path to the demonstration file recorded in step 2. In our case this is: `./UnitySDK/Assets/Demonstrations/AgentRecording.demo`
51
-
6. Launch `mlagent-learn`, providing `./config/offline_bc_config.yaml` as the config parameter, and include the `--run-id` and `--train` as usual. Provide your environment as the `--env` parameter if it has been compiled as standalone, or omit to train in the editor.
69
+
5. Modify the `demo_path` parameter in the file to reference the path to the
70
+
demonstration file recorded in step 2. In our case this is:
as the config parameter, and include the `--run-id` and `--train` as usual.
74
+
Provide your environment as the `--env` parameter if it has been compiled
75
+
as standalone, or omit to train in the editor.
52
76
7. (Optional) Observe training performance using Tensorboard.
53
77
54
-
This will use the demonstration file to train a neural network driven agent to directly imitate the actions provided in the demonstration. The environment will launch and be used for evaluating the agent's performance during training.
78
+
This will use the demonstration file to train a neural network driven agent
79
+
to directly imitate the actions provided in the demonstration. The environment
80
+
will launch and be used for evaluating the agent's performance during training.
55
81
56
82
### Online Training
57
83
58
-
It is also possible to provide demonstrations in realtime during training, without pre-recording a demonstration file. The steps to do this are as follows:
84
+
It is also possible to provide demonstrations in realtime during training,
85
+
without pre-recording a demonstration file. The steps to do this are as follows:
59
86
60
87
1. First create two Brains, one which will be the "Teacher," and the other which
61
88
will be the "Student." We will assume that the names of the Brain
@@ -65,27 +92,27 @@ It is also possible to provide demonstrations in realtime during training, witho
65
92
3. The "Student" Brain must be a **Learning Brain**.
66
93
4. The Brain Parameters of both the "Teacher" and "Student" Brains must be
67
94
compatible with the agent.
68
-
5. Drag both the "Teacher" and "Student" Brain into the Academy's `Broadcast Hub`
95
+
5. Drag both the "Teacher" and "Student" Brain into the Academy's `Broadcast Hub`
69
96
and check the `Control` checkbox on the "Student" Brain.
70
-
4. Link the Brains to the desired Agents (one Agent as the teacher and at least
97
+
6. Link the Brains to the desired Agents (one Agent as the teacher and at least
71
98
one Agent as a student).
72
-
5. In `config/online_bc_config.yaml`, add an entry for the "Student" Brain. Set
73
-
the `trainer` parameter of this entry to `imitation`, and the
99
+
7. In `config/online_bc_config.yaml`, add an entry for the "Student" Brain. Set
100
+
the `trainer` parameter of this entry to `online_bc`, and the
74
101
`brain_to_imitate` parameter to the name of the teacher Brain: "Teacher".
75
102
Additionally, set `batches_per_epoch`, which controls how much training to do
76
103
each moment. Increase the `max_steps` option if you'd like to keep training
77
104
the Agents for a longer period of time.
78
-
6. Launch the training process with `mlagents-learn config/online_bc_config.yaml
105
+
8. Launch the training process with `mlagents-learn config/online_bc_config.yaml
79
106
--train --slow`, and press the :arrow_forward: button in Unity when the
80
107
message _"Start training by pressing the Play button in the Unity Editor"_ is
81
108
displayed on the screen
82
-
7. From the Unity window, control the Agent with the Teacher Brain by providing
109
+
9. From the Unity window, control the Agent with the Teacher Brain by providing
83
110
"teacher demonstrations" of the behavior you would like to see.
84
-
8. Watch as the Agent(s) with the student Brain attached begin to behave
111
+
10. Watch as the Agent(s) with the student Brain attached begin to behave
85
112
similarly to the demonstrations.
86
-
9. Once the Student Agents are exhibiting the desired behavior, end the training
113
+
11. Once the Student Agents are exhibiting the desired behavior, end the training
87
114
process with `CTL+C` from the command line.
88
-
10. Move the resulting `*.bytes` file into the `TFModels` subdirectory of the
115
+
12. Move the resulting `*.bytes` file into the `TFModels` subdirectory of the
89
116
Assets folder (or a subdirectory within Assets of your choosing) , and use
0 commit comments