Home >Technology peripherals >AI >The weights, codes, and data sets are all open source, and the performance exceeds Mistral-7B. Apple's small model is here
Small models becoming a trend?
This week, OpenAI launched the small model GPT-4o-mini, and the small model track was officially launched. Recently joining this track is Apple.
Recently, Apple, as one of the research institutions of the DataComp-LM (DCLM) project, released the DCLM-7B open source model on Hugging Face. The model performance has surpassed Mistral-7B and is approaching other leading open source models, including Llama 3 and Gemma.
Paper link: https://arxiv.org/pdf/2406.11794
Project link: https://huggingface.co/apple/DCLM-7B
Paper author One, Vaishaal Shankar of Apple's machine learning team, described the DCLM model as "the best model that is truly open source" because DCLM not only open sourced the model weights, but also open sourced the training code and pre-training data set.
Research Introduction
One current evaluation challenge facing large language models (LLMs) is the lack of controlled comparisons. LLM studies often compare models with different architectures, computations, or hyperparameters, making it difficult to disentangle the factors that influence language model quality.
Based on this, the research team proposed a new benchmark for language model data comparison - DCLM. This is the first benchmark for language model training data curation, aiming to allow LLM to improve model performance by designing high-quality data sets. Especially in the multimodal realm.
The research team found that model-based filtering, where machine learning (ML) models automatically filter and select high-quality data from larger datasets, may be the key to building high-quality training sets.
The overall idea of DCLM is simple: use a standardized framework to conduct experiments, including fixed model architecture, training code, hyperparameters and evaluation, and finally find out which data sorting strategy is most suitable for training a high-performance model.
Using DCLM, the research team constructed a high-quality dataset DCLM-BASELINE and used this dataset to train a 7B parameter model from scratch - DCLM-7B. Detail of the DCLM-7B model.
DCLM-7B uses a pre-training solution based on the OpenLM framework, and the 5-shot accuracy reaches 64% on the MMLU benchmark, which is comparable to Mistral-7B-v0.3 (63%) and Llama 3 8B (66%) It is comparable to Mistral-7B-v0.3 and Llama 3 8B, and the average performance on 53 natural language understanding tasks is also comparable to Mistral-7B-v0.3 and Llama 3 8B, while the required calculation amount is only 1/6 of Llama 3 8B.The following are the evaluation results of DCLM-7B on various tasks (parts):
The comparison results of DCLM-7B with other models of the same size are shown in the table below:
Notable Yes, most other models have open weights but closed data. This is why Vaishaal Shankar describes the DCLM model as "truly open source."
Reference link: https://venturebeat.com/ai/apple-shows-off-open-ai-prowess-new-models-outperform-mistral-and-hugging-face-offerings/
The above is the detailed content of The weights, codes, and data sets are all open source, and the performance exceeds Mistral-7B. Apple's small model is here. For more information, please follow other related articles on the PHP Chinese website!