-
Notifications
You must be signed in to change notification settings - Fork 4.3k
POCA trainer #5005
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
POCA trainer #5005
Conversation
This reverts commit 292b6ce.
Integrate into CC
* simple rl multiagent env * runs but does not train * assemble terminal steps * seems to train * fix final reward * Merge changes * fix multiple discrete actions * Lots of small fixes for multiagent env * Fix just_died * Add simple RL tests * Add LSTM simple_rl for COMA * adding comments to multiagent rl * Address comments Co-authored-by: Ervin Teng <[email protected]>
erge branch 'develop-poca-trainer' into develop-coma2-trainer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, will wait for at least one more reviewer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I approve once all comments have been resolved.
Including those that were wrongfully marked as outdated by github.
) | ||
return value_outputs, critic_mem_out | ||
|
||
def forward( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove this method. It has no reason to be public.
# Convert to tensors | ||
current_obs = [ModelUtils.list_to_tensor(obs) for obs in current_obs] | ||
group_obs = GroupObsUtil.from_buffer(batch, n_obs) | ||
group_obs = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some of my comments got lost. Please review them in the conversation tab : #5005 (comment)
if ( | ||
BufferKey.GROUPMATE_REWARDS in mini_batch | ||
and BufferKey.GROUP_REWARD in mini_batch | ||
): | ||
if self.add_groupmate_rewards: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Invert these 2 ifs. No need to check the first one if there are no groumaterewards
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These if conditions could be better:
if self.add_groupmate_rewards and BufferKey.GROUPMATE_REWARDS in mini_batch : do the groupmate reward
if BufferKey.GROUP_REWARD in mini_batch : Do the group reward
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated
return rsa, x_self_encoder | ||
|
||
@staticmethod | ||
def encode_observations( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Stand by my statement, make create_residual_self_attention a module with encode_observations its forward method
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Call it ObservationEncoder
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
…nologies/ml-agents into develop-coma2-trainer
* Make Observation Encoder a module * Fix copy normalize
Proposed change(s)
This PR adds the POCA trainer and associated tests. In addition it makes changes to the extrinsic reward provider to enable team-based rewards to work.
Useful links (Github issues, JIRA tickets, ML-Agents forum threads etc.)
PR for documentation - to be merged after this one #5056
Explanation of some of the design choices:
Types of change(s)
Checklist
Other comments