Gert LekGert Lek
Back

OT Activation Steering accepted at ICLR 2026 TTU Workshop

activation-steeringoptimal-transportdLLMsICLR

We are excited to announce that "An Optimal Transport View of Activation Steering In Masked Diffusion Models" has been accepted at the ICLR 2026 TTU Workshop (Main Track).

Motivation

Diffusion Large Language Models (dLLMs) offer a non-autoregressive alternative to left-to-right decoding, but inference-time control in dLLMs remains underdeveloped relative to autoregressive LLMs. Prior activation-steering methods for masked diffusion models (MDMs) focus primarily on concept negation and employ heuristics that do not explicitly optimize the transport objective.

Our Approach

Building on the Activation Transport (AcT) formulation from Rodriguez et al. (2025), we introduce an Optimal Transport (OT) view of activation steering for MDMs. Given contrastive prompt distributions, we learn a lightweight affine map that transports pooled activation distributions from a source behavior to a target behavior.

This perspective unifies common steering rules (activation addition, mean-shift, directional ablation) as special cases of an affine transport map, and motivates the use of the OT estimator that matches first- and second-order moments.

Results

Across three state-of-the-art dLLMs (LLaDA-Instruct, LLaDA 1.5, Dream-Instruct), affine OT steering improves instruction-following accuracy (e.g., +6.5 to +11.9 absolute points) with no inference-time overhead.

Links

This work is a collaboration with Chaoyi Zhu, Pin-Yu Chen, Robert Birke, and Lydia Y. Chen.