Software engineers are developing a way to run AI language models without matrix multiplication

Viewing LM without MatMul. The sequence of operations is shown for vanilla self-attention (top left), MatMul tokenless mixer (top right), and triple stacks. LM without MatMul uses a token mixer…

Continue ReadingSoftware engineers are developing a way to run AI language models without matrix multiplication