Hi there! Thanks for this excellent repo! The code is really nice and great lifesaver for other researchers!
I am trying to reproduce the result in the paper "Deep Multi-Agent Reinforcement Learning for Decentralised Continuous Cooperative Control" and find that there is some performance gap between my result and the reported results. I think this might due to my carelessness on the hyper-parameters, so I am looking for help in this issue.
Apart from using the default comix.yaml and particle.yaml configs, I additionally introduces these parameters in the config:
# According to paper F.1
batch_size: 1024
gamma: 0.85
lr: 0.01
rnn_hidden_dim: 64
t_max: 2000000
test_interval: 2000
save_model: True
save_model_interval: 200000
And this is the result of COMIX in Continuous Predator-Prey environment with 8 repetitions (no difference in the config, just repeat):

For reference, this is the learning curve in original paper:

For details, I find that final performance (episode return) of 8 trials varying drastically:
244.5
2.0
248.0
207.25
1.0
271.25
274.0
163.5
so I guess there is some hyper-parameters that I have ignored, which leads to the failure of some trials.
Could anyone help to provide some suggestions on this issue? Thanks!!!
Hi there! Thanks for this excellent repo! The code is really nice and great lifesaver for other researchers!
I am trying to reproduce the result in the paper "Deep Multi-Agent Reinforcement Learning for Decentralised Continuous Cooperative Control" and find that there is some performance gap between my result and the reported results. I think this might due to my carelessness on the hyper-parameters, so I am looking for help in this issue.
Apart from using the default
comix.yamlandparticle.yamlconfigs, I additionally introduces these parameters in the config:And this is the result of COMIX in Continuous Predator-Prey environment with 8 repetitions (no difference in the config, just repeat):
For reference, this is the learning curve in original paper:
For details, I find that final performance (episode return) of 8 trials varying drastically:
so I guess there is some hyper-parameters that I have ignored, which leads to the failure of some trials.
Could anyone help to provide some suggestions on this issue? Thanks!!!