Nov 2025 · 5 min read
TL;DR: Can we use generative video models as world models for robots? Yes, but they hallucinate and drift. We fix this with a self-conditioning consistency mechanism and VLM-guided feedback, allowing agents to plan better in the visual space.
Oct 2025 · 7 min read
TL;DR: We found something surprising: training LMs on incorrect reasoning traces (that match the model's distribution) works better than training on correct but out-of-distribution human traces. This suggests "reasoning" in LMs is heavily tied to distribution matching.
Jun 2025 · 4 min read
TL;DR: A deep dive into the stability issues of scaling Q-learning with Transformers. My notes on reproducing recent offline RL papers and where they break down.
More posts coming soon...