[Feature Request] Refactor key usage of loss modules #1174

vmoens · 2023-05-22T08:00:40Z

We have various loss modules in RL.
They work as

loss_module = LossModule(network, …)
loss_module(data)

data is a TensorDict instance. The loss will pick up a bunch of keys from it and read them.
Some are defined by network (network will simply do network(data))
For some other operations, it is LossModule that will read the keys from the tensordict. For instance:

rl/torchrl/objectives/ddpg.py

Line 139 in 714d645

return -td_copy.get("state_action_value")

rl/torchrl/objectives/ppo.py

Line 188 in d6a466d

prev_log_prob = tensordict.get("sample_log_prob")

What we would like to do is to have a way to tell the loss module where to find these keys, something like

loss_module.set_keys(sample_log_prob=“some_other_key”)

This set_keys would take a limited number of arguments for each loss module and write a private attribute with the key that points to the value we want.

Like this the loss would be 100% oblivious to choices from the user in terms of key naming, but still have default values for an easier integration.

It will also remove the key names from the __init__ method which pollutes them.

Action items

For each loss, implement set_keys and document which keys can be written on a case-by-case basis.

In each constructor, raise a deprecation warning if users pass a key
e.g. here

rl/torchrl/objectives/ppo.py

Lines 125 to 127 in d6a466d

    
           advantage_key: str = "advantage", 
        
           value_target_key: str = "value_target", 
        
           value_key: str = "state_value",

write tests for each loss to check that this works as expected
We should do the same for the value modules (here https://github.com/pytorch/rl/blob/d6a466da6b403a6ec87bbb633d0249cbd824475e/torchrl/objectives/value/advantages.py)

Context

To give you some context, the way we will be using that is when users (for instance in MultiAgent RL) are using nested keys:

reward = data[‘next’, ‘agents’, ‘reward’]

which is unconventional but should be supported.
So we want to be able to tell the loss: look, here the reward is not in (‘next’, ‘reward’)

cc @Blonck @matteobettini

The text was updated successfully, but these errors were encountered:

vmoens added the enhancement New feature or request label May 22, 2023

vmoens self-assigned this May 22, 2023

Blonck self-assigned this May 22, 2023

Blonck changed the title ~~[Feature Request]~~ [Feature Request] Refactor key usage of loss modules May 22, 2023

Blonck mentioned this issue May 22, 2023

[Refactor] the usage of tensordict keys in loss modules #1175

Merged

7 tasks

vmoens closed this as completed in #1175 May 31, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature Request] Refactor key usage of loss modules #1174

[Feature Request] Refactor key usage of loss modules #1174

vmoens commented May 22, 2023

[Feature Request] Refactor key usage of loss modules #1174

[Feature Request] Refactor key usage of loss modules #1174

Comments

vmoens commented May 22, 2023

Action items

Context