Super Deepthroat 1211b Work ^new^ -
Standard transformers have O(n²) complexity in attention. For a 1M token context, this is intractable. The "work" solves this via:
Here is a feature profile for the The Core Philosophy: "The Frictionless Shift" super deepthroat 1211b work