More small models, 7B Orca-Math from Microsoft
The authors write: “Our findings show that smaller models are valuable in specialized settings where they can match the performance of much larger models but with a limited scope. By training Orca-Math on a small dataset of 200,000 math problems, we have achieved performance levels that rival or surpass those of much larger models.” “Orca-Math achieves 86.81% on GSM8k pass@1, exceeding the performance of much bigger models including general models (e.g. LLAMA-2–70, Gemini Pro and GPT-3.5) and math-specific models (e.g. MetaMath-70B and WizardMa8th-70B). Note that the base model (Mistral-7B) achieves 37.83% on GSM8K.”
Throughout my professional practice, I have developed a significant number of models. Language models are fundamental in powering both search functionalities and recommendation systems. They adeptly classify queries by their intent, a feature particularly crucial when a singular search application caters to various verticals. Additionally, these models are capable of autocompleting queries and performing an array of tasks to thoroughly understand queries, documents/items, and ensure their proper matching.
It is essential to recognize that these applications demand exceptionally high throughput and low latency, making the deployment of compact models imperative. Beyond the sphere of search functionalities, there is a vast array of applications that similarly require high throughput and low latency, highlighting the universal necessity for minimizing cloud computing expenses.
Compact models excel in executing specialized tasks, a realization that is set to profoundly influence the industrial applications of Natural Language Processing (NLP). These advancements underscore the significant potential of small models in enhancing efficiency and driving innovation across numerous sectors.