Hi! I'm an AI researcher, working on the next big thing 🤫.

Most recently, I led a team that developed the AI Economist, which uses deep reinforcement learning and super-fast, data-driven multi-agent simulations for economic analysis and policy design to improve social welfare.

My research has been covered in US media, e.g., MIT Technology Review, and international media, e.g., the Financial Times (UK), het Financieele Dagblad, de Volkskrant (NL), FastCompany World-Changing Ideas. I was also on Dutch radio to talk about the AI Economist!

Before working on AI, I was a theoretical physicist. You can always wake me up for superstrings and topological branes.

I grew up in the Netherlands (gezellig!), and have lived in the UK and the US.



Mitigating climate change requires international cooperation. But without a central authority, nations need to negotiate and reach agreements, and do so out of their own volition. Which protocols and agreements are self-incentivizing and lead to better climate outocmes? Find out more at AI for Global Climate Cooperation!

AI for Global Climate Cooperation: Modeling Global Climate Negotiations, Agreements, and Long-Term Cooperation in RICE-N
Tianyu Zhang, Andrew Williams, Soham Phade, Sunil Srinivasa, Yang Zhang, Prateek Gupta, Yoshua Bengio, Stephan Zheng.
[Paper] [Website]

A position paper that surveys the potential of next-gen simulations.

Simulation Intelligence: Towards a New Generation of Scientific Methods
Alexander Lavin, Hector Zenil, Brooks Paige, David Krakauer, Justin Gottschlich, Tim Mattson, Anima Anandkumar, Sanjay Choudry, Kamil Rocki, Atılım Güneş Baydin, Carina Prunkl, Brooks Paige, Olexandr Isayev, Erik Peterson, Peter L. McMahon, Jakob Macke, Kyle Cranmer, Jiaxin Zhang, Haruko Wainwright, Adi Hanuka, Manuela Veloso, Samuel Assefa, Stephan Zheng, Avi Pfeffer

On a framework to develop and deploy machine learning systems.

Technology readiness levels for machine learning systems
Alexander Lavin, Ciarán M. Gilligan-Lee, Alessya Visnjic, Siddha Ganju, Dava Newman, Sujoy Ganguly, Danny Lange, Atílím Güneş Baydin, Amit Sharma, Adam Gibson, Stephan Zheng, Eric P. Xing, Chris Mattmann, James Parr & Yarin Gal

We released WarpDrive, an open-source framework for deep multi-agent RL on a GPU. It's orders of magnitude faster than CPU + GPU solutions, and scales well across thousands of environments and agents. In 2d Tag, we get ~3 million steps per second with 2000 parallel environments and 1000 agents!

WarpDrive: Extremely Fast End-to-End Deep Multi-Agent Reinforcement Learning on a GPU
Tian Lan*, Sunil Srinivasa*, Huan Wang, Stephan Zheng.
[Paper] [Blog post] [Code]

Using hierarchical curricula, we can find approximate game-theoretic equilbria in real business cycle models that include consumers, firms, and a government. These are stylized macroeconomic models of the real world.

Finding General Equilibria in Many-Agent Economic Simulations using Deep Reinforcement Learning
Michael Curry, Alex Trott, Soham Phade, Yu Bai, Stephan Zheng.
[Paper] [Code]

Time and energy are scarce. The AI Economist can also analyze organizations with rationally inattentive agents that pay a cost to observe information, and therefore need to choose what to pay attention to. This is a form of bounded rationality, which models human behavior.

Modeling Bounded Rationality in Multi-Agent Simulations Using Rationally Inattentive Reinforcement Learning
Tong Mu, Alex Trott, Stephan Zheng.
[Paper (coming soon)] [Code]

The AI Economist learns public health and economic policies in a data-driven pandemic-economic simulation of COVID-19. AI policies can halve deaths compared to the real world, while maintaining similar unemployment levels. The policies also are interpretable and robust.

Building a Foundation for Data-Driven, Interpretable, and Robust Policy Design using the AI Economist
Alexander Trott*, Sunil Srinivasa*, Douwe van der Wal, Sebastien Haneuse, Stephan Zheng*.
[Paper] [Web Version] [Web Demo] [Blog post] [Code]

The AI Economist uses reinforcement learning to learn tax policies in AI simulations. AI tax policies yield at least 16% higher equality-times-productivity than prominent tax models. It also shows promising results in human studies, in which real people earn real money in our economic simulation.

The AI Economist: Taxation policy design via two-level deep multiagent reinforcement learning
Stephan Zheng*, Alexander Trott*, Sunil Srinivasa, David C. Parkes, Richard Socher. Science Advances, May 2022.

The AI Economist: Improving Equality and Productivity with AI-Driven Tax Policies
Stephan Zheng*, Alexander Trott*, Sunil Srinivasa, Nikhil Naik, Melvin Gruesbeck, David C. Parkes, Richard Socher.
[Press Release] [Blog] [PDF]

The AI Economist: Improving Equality and Productivity with AI-Driven Tax Policies
Stephan Zheng*, Alexander Trott*, Sunil Srinivasa, Nikhil Naik, Melvin Gruesbeck, David C. Parkes, Richard Socher.
[Press Release] [Blog] [PDF]

The AI Economist: Optimal Economic Policy Design via Two-level Deep Reinforcement Learning
Stephan Zheng*, Alexander Trott*, Sunil Srinivasa, David C. Parkes, Richard Socher.

AI can talk about physics! We train deep language models to generate descriptions of physics. 👉👉 Next stop: describing string theory!

ESPRIT: Explaining Solutions to Physical Reasoning Tasks
Nazneen Fatema Rajani, Rui Zhang, Yi Chern Tan, Stephan Zheng, Jeremy Weiss, Aadit Vyas, Abhijit Gupta, Caiming XIong, Richard Socher, Dragomir Radev, ACL 2020.

Sibling Rivalry dynamically trades off between exploration and exploitation, and automatically avoids local optima and enables RL policies to converge to the true optimal policy more easily.

Keeping Your Distance: Solving Sparse Reward Tasks Using Self-Balancing Shaped Rewards
Alex Trott, Stephan Zheng, Richard Socher, NeurIPS 2019.

We show that under the reparameterization trick, one can derive generalization bounds for reinforcement learning.

On the Generalization Gap in Reparameterizable Reinforcement Learning
Huan Wang, Stephan Zheng, Caiming XIong, Richard Socher, ICML 2019.

Hierarchical recurrent VAEs can extrapolate multi-agent trajectories well over long horizons by modeling long-term goals.

Generating Multi-Agent Trajectories using Programmatic Weak Supervision
Eric Zhan, Stephan Zheng, Yisong Yue, Patrick Lucey, ICLR 2019.
[PDF] [Demo]

Neural Fingerprinting detects state-of-the-art adversarial attacks with nerar-perfect detection rates on MNIST, CIFAR and MiniImagenet!

Detecting Adversarial Examples via Neural Fingerprinting
Stephan Zheng*, Sumanth Dathathri*, Richard Murray, Yisong Yue. (*equal contribution)
[PDF] [Code]

Multi-resolution learning accelerates learning interpretable tensor models from high-resolution spatiotemporal data. Such tensor models can model rich multi-agent correlations, as in basketball and social interactions between fruitflies.

Multiresolution Tensor Learning for Efficient and Interpretable Spatial Analysis
Jung Yeon Park, Kenneth Theo Carr, Stephan Zheng, Yisong Yue, Rose Yu, ICML 2020.

Multi-resolution Tensor Learning for Large-Scale Spatial Data
Stephan Zheng, Rose Yu, Yisong Yue.

Hierarchical policy networks can learn to move like professional basketball players! Using predicted long-term goal of human players significantly improves track generation and fools professional sports analysts.

Generating Long-term Trajectories Using Deep Hierarchical Networks
Stephan Zheng, Yisong Yue, Patrick Lucey, Neural Information Processing Systems (NIPS) 2016.
[PDF] [Data]

Stochastic data augmentation makes neural networks more robust to input perturbations, such as JPEG corruption. Deployed in Google Image Search!

Improving the Robustness of Deep Neural Networks via Stability Training
Stephan Zheng, Yang Song, Ian Goodfellow, Thomas Leung, IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2016.


My master's thesis on relating different types of topological branes using exotic dualities. First introduced by Edward Witten, you can construct exotic duals by using Morse theory on complexifications of quantum fields.

Exotic path integrals and dualities, Stephan Zheng, 2011.

Janus particles are synthetic particles that could provide localized drug delivery, for example. We study the electromagnetic screening effects of Janus particles in plasmas by (numerically) solving the non-linear Poisson-Boltzmann equation. This paper built on my bachelor's thesis, joint with Elma C. Gallardo.

Screening of Heterogeneous Surfaces: Charge Renormalization of Janus Particles
Niels Boon, Elma C. Gallardo, Stephan Zheng, Eelco Eggen, Monique Dijkstra, Rene van Roij, Journal of Physics: Condensed Matter 22 (2010) 10410.