Shenao Zhang

I am a third-year Ph.D. student at Northwestern University advised by Prof. Zhaoran Wang. I received my M.S. degree from Georgia Tech, where I was fortunate to work with Prof. Tuo Zhao and Prof. Bo Dai. I obtained my Bachelor's degree from South China University of Technology and visited Berkeley EECS during my undergrad.
Previously, I interned at Google, Microsoft, ByteDance Seed, and Tencent AI Lab.

CV / Email / Google Scholar / LinkedIn / Twitter

Research

My research centers around Large Language Models (LLMs) and Reinforcement Learning (RL). I'm currently interested in LLM/agent reasoning and alignment. The ultimate goal of my research is to build systems that actively explore and self-improve to achieve super-human intelligence. Previously, I developed data-efficient decision-making algorithms with applications to robotic and multi-agent systems.

(* indicates equal contribution, ^† indicates equal advising)

	Beyond Markovian: Reflective Exploration via Bayes-Adaptive RL for LLM Reasoning Shenao Zhang, Yaqing Wang, Yinxiao Liu, Tianqi Liu, Peter Grabowski, Eugene Ie, Zhaoran Wang^†, Yunxuan Li^† Preprint, 2025 Featured by MIT Tech Review China paper / code / poster / thread /
	BRiTE: Bootstrapping Reinforced Thinking Process to Enhance Language Model Reasoning Han Zhong, Yutong Yin, Shenao Zhang, Xiaojun Xu, Yuanxin Liu, Yifei Zuo, Zhihan Liu, Boyi Liu, Sirui Zheng, Hongyi Guo, Liwei Wang, Mingyi Hong, Zhaoran Wang International Conference on Machine Learning (ICML), 2025* paper / poster / thread /
	Offline Reinforcement Learning for LLM Multi-Step Reasoning Huaijie Wang, Shibo Hao, Hanze Dong, Shenao Zhang, Yilin Bao, Ziran Yang, Yi Wu Findings of the Association for Computational Linguistics (ACL), 2025 ICLR Workshop on Reasoning and Planning for LLMs, 2025 (Oral) Featured by HF Daily Papers paper / code / thread /
	Reward-Augmented Data Enhances Direct Preference Alignment of LLMs Shenao Zhang, Zhihan Liu, Boyi Liu, Yufeng Zhang, Yingxiang Yang, Yongfei Liu, Liyu Chen, Tao Sun, Zhaoran Wang International Conference on Machine Learning (ICML), 2025 paper / code / poster / thread /
	Self-Exploring Language Models: Active Preference Elicitation for Online Alignment Shenao Zhang, Donghan Yu, Hiteshi Sharma, Ziyi Yang, Shuohang Wang, Hany Hassan, Zhaoran Wang Transactions on Machine Learning Research (TMLR) ICML AutoRL Workshop, 2024 (Best Paper Award) Featured by HF Daily Papers paper / code / models / thread /
	Provably Mitigating Overoptimization in RLHF: Your SFT Loss is Implicitly an Adversarial Regularizer Zhihan Liu, Miao Lu, Shenao Zhang, Boyi Liu, Hongyi Guo, Yingxiang Yang, Jose Blanchet, Zhaoran Wang Neural Information Processing Systems (NeurIPS), 2024 paper /
	Reason for Future, Act for Now: A Principled Framework for Autonomous LLM Agents with Provable Sample Efficiency Zhihan Liu, Hao Hu, Shenao Zhang, Hongyi Guo, Shuqi Ke, Boyi Liu, Zhaoran Wang International Conference on Machine Learning (ICML), 2024* paper / code / website / thread /
	Adaptive-Gradient Policy Optimization: Enhancing Policy Learning in Non-Smooth Differentiable Simulations Feng Gao, Liangzhi Shi, Shenao Zhang, Zhaoran Wang, Yi Wu International Conference on Machine Learning (ICML), 2024 paper /
	Model-Based Reparameterization Policy Gradient Methods: Theory and Practical Algorithms Shenao Zhang, Boyi Liu, Zhaoran Wang^†, Tuo Zhao^† Neural Information Processing Systems (NeurIPS), 2023 paper / code /
	Maximize to Explore: One Objective Function Fusing Estimation, Planning, and Exploration Zhihan Liu, Miao Lu, Wei Xiong, Han Zhong, Hao Hu, Shenao Zhang, Sirui Zheng, Zhuoran Yang, Zhaoran Wang Neural Information Processing Systems (NeurIPS), 2023 (Spotlight)* paper / code /
	Adaptive Barrier Smoothing for First-Order Policy Gradient with Contact Dynamics Shenao Zhang, Wanxin Jin, Zhaoran Wang International Conference on Machine Learning (ICML), 2023 paper / website /
	Conservative Dual Policy Optimization for Efficient Model-Based Reinforcement Learning Shenao Zhang Neural Information Processing Systems (NeurIPS), 2022 paper / video / website / poster / video /
	Learning Meta Representation for Agents in Multi-Agent Reinforcement Learning Shenao Zhang, Li Shen, Lei Han, Li Shen Conference on Lifelong Learning Agents (CoLLAs), 2023 (Oral) paper / poster /
	How Can LLM Guide RL? A Value-Based Approach Shenao Zhang, Sirui Zheng, Shuqi Ke, Zhihan Liu, Wanxin Jin, Jianbo Yuan, Yingxiang Yang, Hongxia Yang, Zhaoran Wang Preprint, 2023 paper / code /
	Structure-Regularized Attention for Deformable Object Representation Shenao Zhang, Li Shen, Zhifeng Li, Wei Liu NeurIPS Workshop on Object Representations for Learning and Reasoning, 2020 paper / code / website / poster /

Professional Service

Conference Review: NeurIPS 2020-25, ICLR 2022-25, ICML 2022-25, AISTATS 2022-25, COLM 2024-25.

Journal Review: Neurocomputing, Transactions on Pattern Analysis and Machine Intelligence (TPAMI), Transactions on Machine Learning Research (TMLR).

Teaching

Graduate Teaching Assistant: Head TA of CS 7648: Interactive Robot Learning (Fall 2021) at Georgia Tech.

Source code from Jon Barron's website

Shenao Zhang

Research

Beyond Markovian: Reflective Exploration via Bayes-Adaptive RL for LLM Reasoning

BRiTE: Bootstrapping Reinforced Thinking Process to Enhance Language Model Reasoning

Offline Reinforcement Learning for LLM Multi-Step Reasoning

Reward-Augmented Data Enhances Direct Preference Alignment of LLMs

Self-Exploring Language Models: Active Preference Elicitation for Online Alignment

Provably Mitigating Overoptimization in RLHF: Your SFT Loss is Implicitly an Adversarial Regularizer

Reason for Future, Act for Now: A Principled Framework for Autonomous LLM Agents with Provable Sample Efficiency

Adaptive-Gradient Policy Optimization: Enhancing Policy Learning in Non-Smooth Differentiable Simulations

Model-Based Reparameterization Policy Gradient Methods: Theory and Practical Algorithms

Maximize to Explore: One Objective Function Fusing Estimation, Planning, and Exploration

Adaptive Barrier Smoothing for First-Order Policy Gradient with Contact Dynamics

Conservative Dual Policy Optimization for Efficient Model-Based Reinforcement Learning

Learning Meta Representation for Agents in Multi-Agent Reinforcement Learning

How Can LLM Guide RL? A Value-Based Approach

Structure-Regularized Attention for Deformable Object Representation

Professional Service

Teaching