The team has released the width-pruned version of the model on Hugging Face under the Nvidia Open Model License, which allows for commercial use. This makes it accessible to a wider range of users and developers who can benefit from its efficiency and performance.
“Pruning and classical knowledge distillation is a highly cost-effective method to progressively obtain LLMs [large language models] of smaller size, achieving superior accuracy compared to training from scratch across all domains,” the researchers wrote. “It serves as a more effective and data-efficient approach compared to either synthetic-data-style fine-tuning or pretraining from scratch.”
This work is a reminder of the value and importance of the open-source community to the progress of AI. Pruning and distillation are part of a wider body of research that is enabling companies to optimize and customize LLMs at a fraction of the normal cost. Other notable works in the field include Sakana AI’s evolutionary model-merging algorithm, which makes it possible to assemble parts of different models to combine their strengths without the need for expensive training resources.