Research
My research centers around Large Language Models (LLMs) and Reinforcement Learning (RL). I’m currently interested in the efficient alignment of LLMs and autonomous LLM agents with advanced planning capabilities, with the ultimate goal of building models that self-improve by actively synthesizing data and learning to reason to achieve super-human intelligence. Previously, I developed data-efficient decision-making algorithms with applications to robotic and multi-agent systems.
(* indicates equal contribution, † indicates equal advising)
|
|
Reward-Augmented Data Enhances Direct Preference Alignment of LLMs
Shenao Zhang, Zhihan Liu, Boyi Liu, Yufeng Zhang, Yingxiang Yang, Yongfei Liu, Liyu Chen, Tao Sun, Zhaoran Wang
Preprint, 2024
paper /
code /
thread /
|
|
Self-Exploring Language Models: Active Preference Elicitation for Online Alignment
Shenao Zhang, Donghan Yu, Hiteshi Sharma, Ziyi Yang, Shuohang Wang, Hany Hassan, Zhaoran Wang
ICML AutoRL Workshop, 2024 (Best Paper Award)
paper /
code /
models /
thread /
|
|
Provably Mitigating Overoptimization in RLHF: Your SFT Loss is Implicitly an Adversarial Regularizer
Zhihan Liu*, Miao Lu*, Shenao Zhang, Boyi Liu, Hongyi Guo, Yingxiang Yang, Jose Blanchet, Zhaoran Wang
Neural Information Processing Systems (NeurIPS), 2024
paper /
|
|
Reason for Future, Act for Now: A Principled Framework for Autonomous LLM Agents with Provable Sample Efficiency
Zhihan Liu*, Hao Hu*, Shenao Zhang*, Hongyi Guo, Shuqi Ke, Boyi Liu, Zhaoran Wang
International Conference on Machine Learning (ICML), 2024
paper /
code /
website /
thread /
|
|
Adaptive-Gradient Policy Optimization: Enhancing Policy Learning in Non-Smooth Differentiable Simulations
Feng Gao*, Liangzhi Shi*, Shenao Zhang, Zhaoran Wang, Yi Wu
International Conference on Machine Learning (ICML), 2024
paper /
|
|
Model-Based Reparameterization Policy Gradient Methods: Theory and Practical Algorithms
Shenao Zhang, Boyi Liu, Zhaoran Wang†, Tuo Zhao†
Neural Information Processing Systems (NeurIPS), 2023
paper /
code /
|
|
Maximize to Explore: One Objective Function Fusing Estimation, Planning, and Exploration
Zhihan Liu*, Miao Lu*, Wei Xiong*, Han Zhong, Hao Hu, Shenao Zhang, Sirui Zheng, Zhuoran Yang, Zhaoran Wang
Neural Information Processing Systems (NeurIPS), 2023 (Spotlight)
paper /
code /
|
|
Adaptive Barrier Smoothing for First-Order Policy Gradient with Contact Dynamics
Shenao Zhang, Wanxin Jin, Zhaoran Wang
International Conference on Machine Learning (ICML), 2023
paper /
website /
|
|
Conservative Dual Policy Optimization for Efficient Model-Based Reinforcement Learning
Shenao Zhang
Neural Information Processing Systems (NeurIPS), 2022
paper /
video /
website /
poster /
video /
|
|
Learning Meta Representation for Agents in Multi-Agent Reinforcement Learning
Shenao Zhang, Li Shen, Lei Han, Li Shen
Conference on Lifelong Learning Agents (CoLLAs), 2023 (Oral)
paper /
poster /
|
|
How Can LLM Guide RL? A Value-Based Approach
Shenao Zhang*, Sirui Zheng*, Shuqi Ke, Zhihan Liu, Wanxin Jin, Jianbo Yuan, Yingxiang Yang, Hongxia Yang, Zhaoran Wang
Preprint, 2023
paper /
code /
|
|
Asking Before Action: Gather Information in Embodied Decision Making with Language Models
Xiaoyu Chen, Shenao Zhang, Pushi Zhang, Li Zhao, Jianyu Chen
Preprint, 2023
paper /
|
|
Structure-Regularized Attention for Deformable Object Representation
Shenao Zhang, Li Shen, Zhifeng Li, Wei Liu
NeurIPS Workshop on Object Representations for Learning and Reasoning, 2020
paper /
code /
website /
poster /
|
Professional Service
Conference Review: NeurIPS 2020-24, ICLR 2022-24, ICML 2022-24, AISTATS 2022-24, COLM 2024, RSS 2021.
Journal Review: Neurocomputing, IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI).
|
Teaching
Graduate Teaching Assistant: Head TA of CS 7648: Interactive Robot Learning (Fall 2021) at Georgia
Tech.
|
|