POCA trainer #5005

andrewcoh · 2021-02-24T20:55:57Z

Proposed change(s)

This PR adds the POCA trainer and associated tests. In addition it makes changes to the extrinsic reward provider to enable team-based rewards to work.

Useful links (Github issues, JIRA tickets, ML-Agents forum threads etc.)

PR for documentation - to be merged after this one #5056
Explanation of some of the design choices:

Types of change(s)

Checklist

Added tests that prove my fix is effective or that my feature works
Updated the changelog (if applicable)
Updated the documentation (if applicable)
Updated the migration guide (if applicable)

Other comments

This reverts commit 292b6ce.

…tic-mm

Integrate into CC

* simple rl multiagent env * runs but does not train * assemble terminal steps * seems to train * fix final reward * Merge changes * fix multiple discrete actions * Lots of small fixes for multiagent env * Fix just_died * Add simple RL tests * Add LSTM simple_rl for COMA * adding comments to multiagent rl * Address comments Co-authored-by: Ervin Teng <[email protected]>

erge branch 'develop-poca-trainer' into develop-coma2-trainer

ervteng

LGTM, will wait for at least one more reviewer

vincentpierre

I approve once all comments have been resolved.
Including those that were wrongfully marked as outdated by github.

ml-agents/mlagents/trainers/poca/optimizer_torch.py

vincentpierre · 2021-03-11T22:13:58Z

ml-agents/mlagents/trainers/poca/optimizer_torch.py

+            )
+            return value_outputs, critic_mem_out
+
+        def forward(


Remove this method. It has no reason to be public.

vincentpierre · 2021-03-11T22:19:22Z

ml-agents/mlagents/trainers/poca/optimizer_torch.py

+        # Convert to tensors
+        current_obs = [ModelUtils.list_to_tensor(obs) for obs in current_obs]
+        group_obs = GroupObsUtil.from_buffer(batch, n_obs)
+        group_obs = [


Some of my comments got lost. Please review them in the conversation tab : #5005 (comment)

vincentpierre · 2021-03-11T22:25:33Z

ml-agents/mlagents/trainers/torch/components/reward_providers/extrinsic_reward_provider.py

+        if (
+            BufferKey.GROUPMATE_REWARDS in mini_batch
+            and BufferKey.GROUP_REWARD in mini_batch
+        ):
+            if self.add_groupmate_rewards:


Invert these 2 ifs. No need to check the first one if there are no groumaterewards

These if conditions could be better:

if self.add_groupmate_rewards and BufferKey.GROUPMATE_REWARDS in mini_batch : do the groupmate reward
if BufferKey.GROUP_REWARD in mini_batch : Do the group reward

ml-agents/mlagents/trainers/torch/networks.py

vincentpierre · 2021-03-11T22:31:39Z

ml-agents/mlagents/trainers/torch/utils.py

+        return rsa, x_self_encoder
+
+    @staticmethod
+    def encode_observations(


Stand by my statement, make create_residual_self_attention a module with encode_observations its forward method

Call it ObservationEncoder

courtesty of @ervteng #5093

ml-agents/mlagents/trainers/settings.py

…nologies/ml-agents into develop-coma2-trainer

* Make Observation Encoder a module * Fix copy normalize

Ervin Teng and others added 30 commits December 15, 2020 11:35

Make the env easier

62e9b45

Remove prints

1ebacc1

Make Collab env harder

cb57bf0

Fix group ID

95b3522

Add cc to ghost trainer

afd7476

Add comment to ghost trainer

292b6ce

Revert "Add comment to ghost trainer"

112a9dc

This reverts commit 292b6ce.

Actually add comment to ghosttrainer

783db4c

Scale size of CC network

6c4ba1e

Scale value network based on num agents

d314478

Add 3rd symbol to hallway collab

c7adb93

Make comms one-hot

d2e315d

Fix S tag

5cf76e3

Merge branch 'master' into develop-centralizedcritic-mm

8708f70

Additional changes

44fb8b5

Some more fixes

56f9dbf

Self-attention Centralized Critic

a468075

separate entity encoder and RSA

db184d9

clean up args in mha

32cbdee

more cleanups

c90472c

fixed tests

d429b53

Merge branch 'develop-attention-refactor' into develop-centralizedcri…

44093f2

…tic-mm

Merge branch 'develop-attention-refactor' into develop-centralizedcri…

1dc0059

…tic-mm

entity embeddings work with no max

2b5b994

Integrate into CC

remove group id

cd84fe3

very rough sketch for TeamManager interface

eed2fce

One layer for entity embed

fe41094

Use 4 heads

3822b18

add defaults to linear encoder, initialize ent encoders

3f4b2b5

Merge branch 'master' into develop-centralizedcritic-mm

c7c7d4c

andrewcoh and others added 7 commits March 10, 2021 15:57

add docstrings to network body

2d0ee89

ource /Users/ervin/.virtualenvs/mlagents-38/bin/activate

8046811

erge branch 'develop-poca-trainer' into develop-coma2-trainer

Update tests

ab6b1d5

rename to MultiAgentNetwork, docstring

0f4201a

fix references to ppo

56548dd

docstrings to poca optimizer

3b91d38

andrewcoh changed the title ~~COMA2 trainer~~ POCA trainer Mar 10, 2021

Ervin T and others added 8 commits March 10, 2021 21:00

Move common loss functions for PPO and POCA (#5079)

20c8759

Turn on the SimpleMultiAgentGroup

2ed7f46

[poca] Remove add_groupmate_rewards from settings (#5082)

8511f9f

Merge branch 'main' into develop-coma2-trainer

445c1f0

Untrack PB Collab Config

f98c615

Update comment and fix reporting of group dones

65af6ff

reduce hybrid sac steps

7f92adc

Merge branch 'main' into develop-coma2-trainer

7058eaa

ervteng approved these changes Mar 11, 2021

View reviewed changes

vincentpierre approved these changes Mar 11, 2021

View reviewed changes

Ervin Teng and others added 9 commits March 11, 2021 18:19

Refactor extrinsic reward provider

3ef5f17

Create POCASettings class

83c3187

use group reward + final reward to calculate ELO

461db66

Merge branch 'develop-coma2-trainer' of https://github.com/Unity-Tech…

9b1369f

…nologies/ml-agents into develop-coma2-trainer

parameter docstring to POCA Value

d50b873

rename group obs to groupmate obs

5d3e500

[poca] Make Observation Encoder a module (#5093)

ff9bd1e

* Make Observation Encoder a module * Fix copy normalize

Get processors out of observation_encoder

100a7ac

rename to groupmate obs (#5094)

10d63ae

andrewcoh merged commit d63a9d7 into main Mar 12, 2021

delete-merged-branch bot deleted the develop-coma2-trainer branch March 12, 2021 01:48

github-actions bot locked as resolved and limited conversation to collaborators Mar 12, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

POCA trainer #5005

POCA trainer #5005

andrewcoh commented Feb 24, 2021 •

edited

Loading

ervteng left a comment

vincentpierre left a comment

vincentpierre Mar 11, 2021

vincentpierre Mar 11, 2021

vincentpierre Mar 11, 2021

vincentpierre Mar 11, 2021

ervteng Mar 11, 2021

vincentpierre Mar 11, 2021

vincentpierre Mar 11, 2021

andrewcoh Mar 12, 2021

POCA trainer #5005

POCA trainer #5005

Conversation

andrewcoh commented Feb 24, 2021 • edited Loading

Proposed change(s)

Useful links (Github issues, JIRA tickets, ML-Agents forum threads etc.)

Types of change(s)

Checklist

Other comments

ervteng left a comment

Choose a reason for hiding this comment

vincentpierre left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

andrewcoh commented Feb 24, 2021 •

edited

Loading