References
[1]
Lawrence M Ausubel and Raymond J Deneckere. A generalized theorem of the maximum.
Economic Theory, 3(1):99–107, 1993.
[2]
Christopher Berner, Greg Brockman, Brooke Chan, Vicki Cheung, Przemysław D˛ebiak, Christy
Dennison, David Farhi, Quirin Fischer, Shariq Hashme, Chris Hesse, et al. Dota 2 with large
scale deep reinforcement learning. arXiv preprint arXiv:1912.06680, 2019.
[3]
Christian Schroeder de Witt, Tarun Gupta, Denys Makoviichuk, Viktor Makoviychuk, Philip HS
Torr, Mingfei Sun, and Shimon Whiteson. Is independent learning all you need in the starcraft
multi-agent challenge? arXiv preprint arXiv:2011.09533, 2020.
[4]
Christian Schröder de Witt, Bei Peng, Pierre-Alexandre Kamienny, Philip H. S. Torr, Wendelin
Böhmer, and Shimon Whiteson. Deep multi-agent reinforcement learning for decentralized
continuous cooperative control. CoRR, abs/2003.06709, 2020.
[5]
Logan Engstrom, Andrew Ilyas, Shibani Santurkar, Dimitris Tsipras, Firdaus Janoos, Larry
Rudolph, and Aleksander Madry. Implementation matters in deep policy gradients: A case
study on ppo and trpo. arXiv preprint arXiv:2005.12729, 2020.
[6]
Jakob Foerster, Gregory Farquhar, Triantafyllos Afouras, Nantas Nardelli, and Shimon Whiteson.
Counterfactual multi-agent policy gradients. In Proceedings of the AAAI Conference on Artificial
Intelligence, volume 32, 2018.
[7]
Chloe Ching-Yun Hsu, Celestine Mendler-Dünner, and Moritz Hardt. Revisiting design choices
in proximal policy optimization. arXiv preprint arXiv:2009.10897, 2020.
[8]
Jakub Grudzien Kuba, Ruiqing Chen, Munning Wen, Ying Wen, Fanglei Sun, Jun Wang, and
Yaodong Yang. Trust region policy optimisation in multi-agent reinforcement learning. ICLR,
2022.
[9]
Jakub Grudzien Kuba, Christian Schroeder de Witt, and Jakob Foerster. Mirror learning: A
unifying framework of policy optimisation. ICML, 2022.
[10]
Jakub Grudzien Kuba, Muning Wen, Linghui Meng, Haifeng Zhang, David Mguni, Jun Wang,
Yaodong Yang, et al. Settling the variance of multi-agent policy gradients. Advances in Neural
Information Processing Systems, 34:13458–13470, 2021.
[11]
Hepeng Li and Haibo He. Multi-agent trust region policy optimization. arXiv preprint
arXiv:2010.07916, 2020.
[12]
Timothy P Lillicrap, Jonathan J Hunt, Alexander Pritzel, Nicolas Heess, Tom Erez, Yuval Tassa,
David Silver, and Daan Wierstra. Continuous control with deep reinforcement learning. arXiv
preprint arXiv:1509.02971, 2015.
[13] Michael L Littman. Markov games as a framework for multi-agent reinforcement learning. In
Machine learning proceedings 1994, pages 157–163. Elsevier, 1994.
[14]
Ryan Lowe, Yi Wu, Aviv Tamar, Jean Harb, Pieter Abbeel, and Igor Mordatch. Multi-agent actor-
critic for mixed cooperative-competitive environments. In Proceedings of the 31st International
Conference on Neural Information Processing Systems, pages 6382–6393, 2017.
[15]
Ryan Lowe, Yi Wu, Aviv Tamar, Jean Harb, Pieter Abbeel, and Igor Mordatch. Multi-agent actor-
critic for mixed cooperative-competitive environments. In Proceedings of the 31st International
Conference on Neural Information Processing Systems, pages 6382–6393, 2017.
[16]
Volodymyr Mnih, Adria Puigdomenech Badia, Mehdi Mirza, Alex Graves, Timothy Lillicrap,
Tim Harley, David Silver, and Koray Kavukcuoglu. Asynchronous methods for deep reinforce-
ment learning. In International conference on machine learning, pages 1928–1937. PMLR,
2016.
[17] John Nash. Non-cooperative games. Annals of mathematics, pages 286–295, 1951.
10