Zhengxuan Wu

/blog

my core dump about interpretability, language models, and other stuff.

July 21, 2025
After discussing with my advisors, I plan to graduate very soon. I will share a few thoughts on what I'm looking for in my next job (as a self-reflection). This is based on my current life situation, value proposition, and past experience (> 5 YoE in tech w/ engineering background prior to Ph.D.). I love studying science and being pragmatic. At the same time, I tend to follow my heart in some quite important decisions.
July 12, 2025
Representation steering is a powerful tool for understanding and controlling the behavior of language models. In this post, I will share my lessons learned from using representation steering to understand and control the behavior of language models from our recent work on training a better representation steering method with preference-based training objective.
April 05, 2024
Representation finetuning (ReFT) represents a novel approach to parameter-efficient, powerful, and interpretable fine-tuning of language models. It draws inspiration from our interpretability work in distributed alignment search (DAS). Instead of training any model weights, we train interventions that edit representations on-the-fly. We demonstrate that editing a very limited number of representations is sufficient to achieve or get close to the state-of-the-art (SoTA) performance across a wide range of tasks.
May 09, 2023
Obtaining robust, human-interpretable explanations of large, general-purpose language models is an urgent goal for AI. Building on the theory of causal abstraction, we release this generic library encapsulated Boundless DAS introduced in our paper for find representations that play a given causal role in LLMs with billions of parameters.
June 30, 2022
It takes me years to transition from an aerospace engineering student to a NLP Ph.D. student. I want to share my experience as much as I can, so people can build on top of it to make their experience even better. For my SOP, I have to credit my good friend Nelson F. Liu. I wrote my SOP based on his! I applied twice, and I am also pretty liberal to share the version of my failed attempt. One takeaway for me is that - you need to have a big vision that is grounded with specific past experience.