This page expands on my research vision, ongoing projects, and publications. For a concise overview, see the home page.
Building Foundation-Model Agents that unify internet-scale knowledge with decision-making: offline/unsupervised RL, generative world models, system-2 reasoning, tool-use, and scalable oversight. I aim to move beyond purely autoregressive training toward objectives and representations that support interactive learning, planning, and self-improvement.
Large-scale evidence that training on distribution-matched but incorrect CoTs can outperform training on correct but distribution-mismatched data. This reframes “reasoning supervision” and motivates RL-from-unverified trajectories.
Iterative visual plan refinement using VLM feedback and a self-consistency objective to improve a video world-model for decision-making; shows strong gains on simulated and real robot manipulation videos.
Small LLMs act as peer reviewers; a larger LLM acts as an area chair—reducing cost while improving alignment and reliability across generation, multimodal evaluation, and reasoning tasks.