GeCO: Generative Control as Optimization

Abstract

Diffusion models and flow matching have become a cornerstone of robotic imitation learning, yet they suffer from a structural inefficiency where inference is often bound to a fixed integration schedule that is agnostic to state complexity. This paradigm forces the policy to expend the same computational budget on trivial motions as it does on complex tasks. We introduce Generative Control as Optimization (GeCO), a time-unconditional framework that transforms action synthesis from trajectory integration into iterative optimization.

GeCO learns a stationary velocity field in the action-sequence space where expert behaviors form stable attractors. As a result, test-time inference becomes adaptive: it can exit early for simple states while refining longer for difficult ones. The same stationary geometry also provides an intrinsic, training-free safety signal, since the field norm at the optimized action acts as a robust out-of-distribution detector.

We validate GeCO on standard simulation benchmarks and demonstrate that it scales naturally to π0-series Vision-Language-Action models. As a plug-and-play replacement for standard flow-matching heads, GeCO improves both success rate and efficiency while offering an optimization-native mechanism for safer deployment.

Highlights

Optimization-Based Inference

Replace fixed-step trajectory integration with an adaptive iterative optimization process in action space.

Adaptive Computation

Allocate more refinement to hard states and stop early on easy states, improving inference efficiency.

Intrinsic OOD Awareness

Use the stationary field norm as a training-free signal for anomaly detection and safer deployment.

Method Overview

Conventional diffusion and flow-matching policies learn time-dependent vector fields and rely on a pre-defined inference schedule. GeCO removes this dependency by learning a single stationary velocity field in the action-sequence space. Expert actions become stable attractors, turning action generation into a convergence problem rather than a rollout through fictitious time.

To ensure robust optimization in continuous control, GeCO introduces a velocity rescaling mechanism that modulates the field magnitude based on the distance to the expert manifold. This creates a geometric sink around valid action modes and yields a practical inference loop that is both adaptive and stable.

Because the learned field is stationary, its residual norm directly reflects whether the current state-action pair lies near a learned in-distribution manifold. This provides a simple but effective uncertainty estimate without auxiliary heads, ensembles, or extra safety networks.

Why GeCO

More efficient inference: computation is driven by convergence rather than a rigid number of denoising or integration steps.
Better robustness: the learned stationary geometry naturally separates in-distribution and anomalous states.
Drop-in compatibility: GeCO can serve as a plug-and-play replacement for standard flow-matching heads.
Scalable design: the framework applies to both standard simulation policies and large Vision-Language-Action models.

BibTeX

Coming Soon