Skip to content

Commit d3f9fd6

Browse files
Merge pull request #1588 from Unity-Technologies/hotfix-0.6.0a
Hotfix 0.6.0a to master
2 parents cb0bfa0 + 18528e5 commit d3f9fd6

File tree

11 files changed

+127
-64
lines changed

11 files changed

+127
-64
lines changed

UnitySDK/Assets/ML-Agents/Scripts/InferenceBrain/ModelParamLoader.cs

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -411,8 +411,8 @@ private string CheckVisualObsShape(Tensor tensor, int visObsIndex)
411411
var widthBp = resolutionBp.width;
412412
var heightBp = resolutionBp.height;
413413
var pixelBp = resolutionBp.blackAndWhite ? 1 : 3;
414-
var widthT = tensor.Shape[1];
415-
var heightT = tensor.Shape[2];
414+
var heightT = tensor.Shape[1];
415+
var widthT = tensor.Shape[2];
416416
var pixelT = tensor.Shape[3];
417417
if ((widthBp != widthT) || (heightBp != heightT) || (pixelBp != pixelT))
418418
{

config/curricula/push-block/PushBlockBrain.json

Lines changed: 0 additions & 12 deletions
This file was deleted.

docs/Training-Imitation-Learning.md

Lines changed: 50 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -12,17 +12,31 @@ from a demonstration to learn a policy. [Video Link](https://youtu.be/kpb8ZkMBFY
1212

1313
## Recording Demonstrations
1414

15-
It is possible to record demonstrations of agent behavior from the Unity Editor, and save them as assets. These demonstrations contain information on the observations, actions, and rewards for a given agent during the recording session. They can be managed from the Editor, as well as used for training with Offline Behavioral Cloning (see below).
15+
It is possible to record demonstrations of agent behavior from the Unity Editor,
16+
and save them as assets. These demonstrations contain information on the
17+
observations, actions, and rewards for a given agent during the recording session.
18+
They can be managed from the Editor, as well as used for training with Offline
19+
Behavioral Cloning (see below).
1620

17-
In order to record demonstrations from an agent, add the `Demonstration Recorder` component to a GameObject in the scene which contains an `Agent` component. Once added, it is possible to name the demonstration that will be recorded from the agent.
21+
In order to record demonstrations from an agent, add the `Demonstration Recorder`
22+
component to a GameObject in the scene which contains an `Agent` component.
23+
Once added, it is possible to name the demonstration that will be recorded
24+
from the agent.
1825

1926
<p align="center">
2027
<img src="images/demo_component.png"
2128
alt="BC Teacher Helper"
2229
width="375" border="10" />
2330
</p>
2431

25-
When `Record` is checked, a demonstration will be created whenever the scene is played from the Editor. Depending on the complexity of the task, anywhere from a few minutes or a few hours of demonstration data may be necessary to be useful for imitation learning. When you have recorded enough data, end the Editor play session, and a `.demo` file will be created in the `Assets/Demonstrations` folder. This file contains the demonstrations. Clicking on the file will provide metadata about the demonstration in the inspector.
32+
When `Record` is checked, a demonstration will be created whenever the scene
33+
is played from the Editor. Depending on the complexity of the task, anywhere
34+
from a few minutes or a few hours of demonstration data may be necessary to
35+
be useful for imitation learning. When you have recorded enough data, end
36+
the Editor play session, and a `.demo` file will be created in the
37+
`Assets/Demonstrations` folder. This file contains the demonstrations.
38+
Clicking on the file will provide metadata about the demonstration in the
39+
inspector.
2640

2741
<p align="center">
2842
<img src="images/demo_inspector.png"
@@ -33,29 +47,42 @@ When `Record` is checked, a demonstration will be created whenever the scene is
3347

3448
## Training with Behavioral Cloning
3549

36-
There are a variety of possible imitation learning algorithms which can be used,
37-
the simplest one of them is Behavioral Cloning. It works by collecting demonstrations from a teacher, and then simply uses them to directly learn a policy, in the
38-
same way the supervised learning for image classification or other traditional
39-
Machine Learning tasks work.
50+
There are a variety of possible imitation learning algorithms which can
51+
be used, the simplest one of them is Behavioral Cloning. It works by collecting
52+
demonstrations from a teacher, and then simply uses them to directly learn a
53+
policy, in the same way the supervised learning for image classification
54+
or other traditional Machine Learning tasks work.
4055

4156

4257
### Offline Training
4358

44-
With offline behavioral cloning, we can use demonstrations (`.demo` files) generated using the `Demonstration Recorder` as the dataset used to train a behavior.
59+
With offline behavioral cloning, we can use demonstrations (`.demo` files)
60+
generated using the `Demonstration Recorder` as the dataset used to train a behavior.
4561

4662
1. Choose an agent you would like to learn to imitate some set of demonstrations.
47-
2. Record a set of demonstration using the `Demonstration Recorder` (see above). For illustrative purposes we will refer to this file as `AgentRecording.demo`.
48-
3. Build the scene, assigning the agent a Learning Brain, and set the Brain to Control in the Broadcast Hub. For more information on Brains, see [here](Learning-Environment-Design-Brains.md).
63+
2. Record a set of demonstration using the `Demonstration Recorder` (see above).
64+
For illustrative purposes we will refer to this file as `AgentRecording.demo`.
65+
3. Build the scene, assigning the agent a Learning Brain, and set the Brain to
66+
Control in the Broadcast Hub. For more information on Brains, see
67+
[here](Learning-Environment-Design-Brains.md).
4968
4. Open the `config/offline_bc_config.yaml` file.
50-
5. Modify the `demo_path` parameter in the file to reference the path to the demonstration file recorded in step 2. In our case this is: `./UnitySDK/Assets/Demonstrations/AgentRecording.demo`
51-
6. Launch `mlagent-learn`, providing `./config/offline_bc_config.yaml` as the config parameter, and include the `--run-id` and `--train` as usual. Provide your environment as the `--env` parameter if it has been compiled as standalone, or omit to train in the editor.
69+
5. Modify the `demo_path` parameter in the file to reference the path to the
70+
demonstration file recorded in step 2. In our case this is:
71+
`./UnitySDK/Assets/Demonstrations/AgentRecording.demo`
72+
6. Launch `mlagent-learn`, providing `./config/offline_bc_config.yaml`
73+
as the config parameter, and include the `--run-id` and `--train` as usual.
74+
Provide your environment as the `--env` parameter if it has been compiled
75+
as standalone, or omit to train in the editor.
5276
7. (Optional) Observe training performance using Tensorboard.
5377

54-
This will use the demonstration file to train a neural network driven agent to directly imitate the actions provided in the demonstration. The environment will launch and be used for evaluating the agent's performance during training.
78+
This will use the demonstration file to train a neural network driven agent
79+
to directly imitate the actions provided in the demonstration. The environment
80+
will launch and be used for evaluating the agent's performance during training.
5581

5682
### Online Training
5783

58-
It is also possible to provide demonstrations in realtime during training, without pre-recording a demonstration file. The steps to do this are as follows:
84+
It is also possible to provide demonstrations in realtime during training,
85+
without pre-recording a demonstration file. The steps to do this are as follows:
5986

6087
1. First create two Brains, one which will be the "Teacher," and the other which
6188
will be the "Student." We will assume that the names of the Brain
@@ -65,27 +92,27 @@ It is also possible to provide demonstrations in realtime during training, witho
6592
3. The "Student" Brain must be a **Learning Brain**.
6693
4. The Brain Parameters of both the "Teacher" and "Student" Brains must be
6794
compatible with the agent.
68-
5. Drag both the "Teacher" and "Student" Brain into the Academy's `Broadcast Hub`
95+
5. Drag both the "Teacher" and "Student" Brain into the Academy's `Broadcast Hub`
6996
and check the `Control` checkbox on the "Student" Brain.
70-
4. Link the Brains to the desired Agents (one Agent as the teacher and at least
97+
6. Link the Brains to the desired Agents (one Agent as the teacher and at least
7198
one Agent as a student).
72-
5. In `config/online_bc_config.yaml`, add an entry for the "Student" Brain. Set
73-
the `trainer` parameter of this entry to `imitation`, and the
99+
7. In `config/online_bc_config.yaml`, add an entry for the "Student" Brain. Set
100+
the `trainer` parameter of this entry to `online_bc`, and the
74101
`brain_to_imitate` parameter to the name of the teacher Brain: "Teacher".
75102
Additionally, set `batches_per_epoch`, which controls how much training to do
76103
each moment. Increase the `max_steps` option if you'd like to keep training
77104
the Agents for a longer period of time.
78-
6. Launch the training process with `mlagents-learn config/online_bc_config.yaml
105+
8. Launch the training process with `mlagents-learn config/online_bc_config.yaml
79106
--train --slow`, and press the :arrow_forward: button in Unity when the
80107
message _"Start training by pressing the Play button in the Unity Editor"_ is
81108
displayed on the screen
82-
7. From the Unity window, control the Agent with the Teacher Brain by providing
109+
9. From the Unity window, control the Agent with the Teacher Brain by providing
83110
"teacher demonstrations" of the behavior you would like to see.
84-
8. Watch as the Agent(s) with the student Brain attached begin to behave
111+
10. Watch as the Agent(s) with the student Brain attached begin to behave
85112
similarly to the demonstrations.
86-
9. Once the Student Agents are exhibiting the desired behavior, end the training
113+
11. Once the Student Agents are exhibiting the desired behavior, end the training
87114
process with `CTL+C` from the command line.
88-
10. Move the resulting `*.bytes` file into the `TFModels` subdirectory of the
115+
12. Move the resulting `*.bytes` file into the `TFModels` subdirectory of the
89116
Assets folder (or a subdirectory within Assets of your choosing) , and use
90117
with `Learning` Brain.
91118

ml-agents/mlagents/envs/rpc_communicator.py

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -53,7 +53,9 @@ def create_server(self):
5353
self.server = grpc.server(ThreadPoolExecutor(max_workers=10))
5454
self.unity_to_external = UnityToExternalServicerImplementation()
5555
add_UnityToExternalServicer_to_server(self.unity_to_external, self.server)
56-
self.server.add_insecure_port('localhost:' + str(self.port))
56+
# Using unspecified address, which means that grpc is communicating on all IPs
57+
# This is so that the docker container can connect.
58+
self.server.add_insecure_port('[::]:' + str(self.port))
5759
self.server.start()
5860
self.is_open = True
5961
except:

ml-agents/mlagents/trainers/buffer.py

Lines changed: 17 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -28,12 +28,27 @@ class AgentBufferField(list):
2828
AgentBufferField with the append method.
2929
"""
3030

31+
def __init__(self):
32+
self.padding_value = 0
33+
super(Buffer.AgentBuffer.AgentBufferField, self).__init__()
34+
3135
def __str__(self):
3236
return str(np.array(self).shape)
3337

38+
def append(self, element, padding_value=0):
39+
"""
40+
Adds an element to this list. Also lets you change the padding
41+
type, so that it can be set on append (e.g. action_masks should
42+
be padded with 1.)
43+
:param element: The element to append to the list.
44+
:param padding_value: The value used to pad when get_batch is called.
45+
"""
46+
super(Buffer.AgentBuffer.AgentBufferField, self).append(element)
47+
self.padding_value = padding_value
48+
3449
def extend(self, data):
3550
"""
36-
Ads a list of np.arrays to the end of the list of np.arrays.
51+
Adds a list of np.arrays to the end of the list of np.arrays.
3752
:param data: The np.array list to append.
3853
"""
3954
self += list(np.array(data))
@@ -99,7 +114,7 @@ def get_batch(self, batch_size=None, training_length=1, sequential=True):
99114
raise BufferException("The batch size and training length requested for get_batch where"
100115
" too large given the current number of data points.")
101116
tmp_list = []
102-
padding = np.array(self[-1]) * 0
117+
padding = np.array(self[-1]) * self.padding_value
103118
# The padding is made with zeros and its shape is given by the shape of the last element
104119
for end in range(len(self), len(self) % training_length, -training_length)[:batch_size]:
105120
tmp_list += [np.array(self[end - training_length:end])]

ml-agents/mlagents/trainers/learn.py

Lines changed: 20 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -6,8 +6,8 @@
66
import numpy as np
77
from docopt import docopt
88

9-
from .trainer_controller import TrainerController
10-
from .exception import TrainerError
9+
from mlagents.trainers.trainer_controller import TrainerController
10+
from mlagents.trainers.exception import TrainerError
1111

1212

1313
def run_training(sub_id, run_seed, run_options, process_queue):
@@ -107,13 +107,23 @@ def main():
107107

108108
jobs = []
109109
run_seed = seed
110-
for i in range(num_runs):
110+
111+
if num_runs == 1:
111112
if seed == -1:
112113
run_seed = np.random.randint(0, 10000)
113-
process_queue = Queue()
114-
p = Process(target=run_training, args=(i, run_seed, options, process_queue))
115-
jobs.append(p)
116-
p.start()
117-
# Wait for signal that environment has successfully launched
118-
while process_queue.get() is not True:
119-
continue
114+
run_training(0, run_seed, options, Queue())
115+
else:
116+
for i in range(num_runs):
117+
if seed == -1:
118+
run_seed = np.random.randint(0, 10000)
119+
process_queue = Queue()
120+
p = Process(target=run_training, args=(i, run_seed, options, process_queue))
121+
jobs.append(p)
122+
p.start()
123+
# Wait for signal that environment has successfully launched
124+
while process_queue.get() is not True:
125+
continue
126+
127+
# For python debugger to directly run this script
128+
if __name__ == "__main__":
129+
main()

ml-agents/mlagents/trainers/policy.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -179,6 +179,7 @@ def export_model(self):
179179
clear_devices=True, initializer_nodes='', input_saver='',
180180
restore_op_name='save/restore_all',
181181
filename_tensor_name='save/Const:0')
182+
logger.info('Exported ' + self.model_path + '.bytes file')
182183

183184
def _process_graph(self):
184185
"""

ml-agents/mlagents/trainers/ppo/trainer.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -224,7 +224,7 @@ def add_experiences(self, curr_all_info: AllBrainInfo, next_all_info: AllBrainIn
224224
epsilons[idx])
225225
else:
226226
self.training_buffer[agent_id]['action_mask'].append(
227-
stored_info.action_masks[idx])
227+
stored_info.action_masks[idx], padding_value=1)
228228
a_dist = stored_take_action_outputs['log_probs']
229229
value = stored_take_action_outputs['value']
230230
self.training_buffer[agent_id]['actions'].append(actions[idx])

ml-agents/mlagents/trainers/trainer_controller.py

Lines changed: 33 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,10 @@
66
import glob
77
import logging
88
import shutil
9+
import sys
10+
if sys.platform.startswith('win'):
11+
import win32api
12+
import win32con
913

1014
import yaml
1115
import re
@@ -103,6 +107,7 @@ def __init__(self, env_path, run_id, save_freq, curriculum_folder,
103107
self.keep_checkpoints = keep_checkpoints
104108
self.trainers = {}
105109
self.seed = seed
110+
self.global_step = 0
106111
np.random.seed(self.seed)
107112
tf.set_random_seed(self.seed)
108113
self.env = UnityEnvironment(file_name=env_path,
@@ -181,6 +186,23 @@ def _save_model(self,steps=0):
181186
self.trainers[brain_name].save_model()
182187
self.logger.info('Saved Model')
183188

189+
def _save_model_when_interrupted(self, steps=0):
190+
self.logger.info('Learning was interrupted. Please wait '
191+
'while the graph is generated.')
192+
self._save_model(steps)
193+
194+
def _win_handler(self, event):
195+
"""
196+
This function gets triggered after ctrl-c or ctrl-break is pressed
197+
under Windows platform.
198+
"""
199+
if event in (win32con.CTRL_C_EVENT, win32con.CTRL_BREAK_EVENT):
200+
self._save_model_when_interrupted(self.global_step)
201+
self._export_graph()
202+
sys.exit()
203+
return True
204+
return False
205+
184206
def _export_graph(self):
185207
"""
186208
Exports latest saved models to .bytes format for Unity embedding.
@@ -288,12 +310,14 @@ def start_learning(self):
288310
self._initialize_trainers(trainer_config)
289311
for _, t in self.trainers.items():
290312
self.logger.info(t)
291-
global_step = 0 # This is only for saving the model
292313
curr_info = self._reset_env()
293314
if self.train_model:
294315
for brain_name, trainer in self.trainers.items():
295316
trainer.write_tensorboard_text('Hyperparameters',
296317
trainer.parameters)
318+
if sys.platform.startswith('win'):
319+
# Add the _win_handler function to the windows console's handler function list
320+
win32api.SetConsoleCtrlHandler(self._win_handler, True)
297321
try:
298322
while any([t.get_step <= t.get_max_steps \
299323
for k, t in self.trainers.items()]) \
@@ -353,31 +377,27 @@ def start_learning(self):
353377
# Write training statistics to Tensorboard.
354378
if self.meta_curriculum is not None:
355379
trainer.write_summary(
356-
global_step,
380+
self.global_step,
357381
lesson_num=self.meta_curriculum
358382
.brains_to_curriculums[brain_name]
359383
.lesson_num)
360384
else:
361-
trainer.write_summary(global_step)
385+
trainer.write_summary(self.global_step)
362386
if self.train_model \
363387
and trainer.get_step <= trainer.get_max_steps:
364388
trainer.increment_step_and_update_last_reward()
365-
global_step += 1
366-
if global_step % self.save_freq == 0 and global_step != 0 \
389+
self.global_step += 1
390+
if self.global_step % self.save_freq == 0 and self.global_step != 0 \
367391
and self.train_model:
368392
# Save Tensorflow model
369-
self._save_model(steps=global_step)
393+
self._save_model(steps=self.global_step)
370394
curr_info = new_info
371395
# Final save Tensorflow model
372-
if global_step != 0 and self.train_model:
373-
self._save_model(steps=global_step)
396+
if self.global_step != 0 and self.train_model:
397+
self._save_model(steps=self.global_step)
374398
except KeyboardInterrupt:
375-
print('--------------------------Now saving model--------------'
376-
'-----------')
377399
if self.train_model:
378-
self.logger.info('Learning was interrupted. Please wait '
379-
'while the graph is generated.')
380-
self._save_model(steps=global_step)
400+
self._save_model_when_interrupted(steps=self.global_step)
381401
pass
382402
self.env.close()
383403
if self.train_model:

0 commit comments

Comments
 (0)