The researchers compared two variants of their MatMul-free LM against the advanced Transformer++ architecture, used in Llama-2, on multiple model sizes.
Interestingly, their scaling projections show that the MatMul-free LM is more efficient in leveraging additional compute resources to improve performance in comparison to the Transformer++ architecture.
The researchers also evaluated the quality of the models on several language tasks. The 2.7B MatMul-free LM outperformed its Transformer++ counterpart on two advanced benchmarks, ARC-Challenge and OpenbookQA, while maintaining comparable performance on the other tasks.
Leave a reply