Running a 400B parameter model on a handheld device showcases the incredible advancements in AI capabilities, making powerful tools more accessible.
The potential for future models to achieve 100 t/s indicates rapid progress in both AI model efficiency and hardware development, which could revolutionize mobile AI applications.
Concerns
The reliance on a mixture of experts raises questions about the actual performance and efficiency of the model, as not all parameters are active simultaneously.