Attention is Not What You Need

Original Paper Summary: This paper views attention as: a particular instance of tensor lifting: a hidden vector is mapped into a high-dimensional space of pairwise interactions, and learning proceeds by constraining this lifted tensor through gradient descent. As an alternative, they propose “an attention-free sequence model built around Grassmann flows.” Instead of lifting from token space to pairwise interaction space, consider lifting to a Grassman manifold. The hidden states are points on this manifold. Each forward pass traces a path on this manifold. This path is a flow that we can learn. ...

January 2, 2026 · 2 min · Me