


All-round open source with no dead ends, Xingbo team's LLM360 makes large models truly transparent
Open source models are showing their vigorous vitality, not only the number is increasing, but the performance is getting better and better. Turing Award winner Yann LeCun also lamented: "Open source artificial intelligence models are on the road to surpassing proprietary models."
Proprietary models are in technical performance and innovation It shows great potential in terms of capabilities, but due to its non-open source characteristics, it hinders the development of LLM. Although some open source models provide practitioners and researchers with diverse choices, most only disclose the final model weights or inference code, and an increasing number of technical reports limit their scope to top-level design and surface statistics. . This closed-source strategy not only limits the development of open-source models, but also hinders the progress of the entire LLM research field to a great extent. This means that these models need to be more comprehensive and Share in depth, including training data, algorithm details, implementation challenges, and performance evaluation details.
Researchers from Cerebras, Petuum and MBZUAI jointly proposed LLM360. This is a comprehensive open source LLM initiative that advocates providing the community with everything related to LLM training, including training code and data, model checkpoints, and intermediate results. The goal of LLM360 is to make the LLM training process transparent and reproducible for everyone, thereby promoting the development of open and collaborative artificial intelligence research.
- Project web page: https://www.llm360.ai/
- Blog: https://www.llm360.ai/blog/introducing-llm360-fully-transparent-open-source-llms.html
- Researcher We developed the architecture of LLM360, focusing on its design principles and the rationale for being fully open source. They specify the components of the LLM360 framework, including specific details such as datasets, code and configuration, model checkpoints, metrics, and more. LLM360 sets an example of transparency for current and future open source models.
Researchers released two large-scale language models pre-trained from scratch under the open source framework of LLM360: AMBER and CRYSTALCODER. AMBER is a 7B English language model pre-trained based on 1.3T tokens. CRYSTALCODER is a 7B English and code language model pre-trained based on 1.4T tokens. In this article, the researchers summarize the development details, preliminary evaluation results, observations, and experiences and lessons learned from these two models. Notably, at the time of release, AMBER and CRYSTALCODER saved 360 and 143 model checkpoints during training, respectively.
Now, let’s take a look at the details of the article
## The framework of #LLM360
LLM360 will provide a standard for what data and codes need to be collected during the LLM pre-training process to ensure that existing work can be better circulated and shared in the community . It mainly contains the following parts:
1. Training data set and data processing code
Pre-training datasets are critical to the performance of large language models. Therefore, it is important to understand the pre-training data set to assess potential behavioral issues and biases. Additionally, publicly available pre-training datasets help improve LLM’s scalability when subsequently fine-tuned and adapted to various domains. Recent research shows that training on repeated data disproportionately reduces the final performance of the model. Therefore, exposing the original pre-training data helps avoid using duplicate data when fine-tuning downstream or continuing to pre-train in a specific domain. Based on the above reasons, LLM360 advocates the disclosure of raw data sets of large language models. Where appropriate, details about data filtering, processing, and training sequences should also be disclosed.
The content that needs to be rewritten is: 2. Training code, hyperparameters and configuration
Training code, hyperparameters, and configuration have a significant impact on the performance and quality of LLM training, but are not always publicly disclosed. In LLM360, researchers open source all the training code, training parameters and system configuration of the pre-training framework.
3. Model checkpoint is rewritten as: 3. Model checkpoint
Regularly saved model checkpoints are also Quite useful. Not only are they critical for failure recovery during training, but they are also useful for post-training research. These checkpoints allow subsequent researchers to continue training the model from multiple starting points without having to train from scratch, aiding in reproducibility. and in-depth research.
4. Performance indicators
Training an LLM often takes weeks to months , the evolutionary trends during training can provide valuable information. However, detailed logs and intermediate metrics of training are currently only available to those who have experienced them, which hinders comprehensive research on LLM. These statistics often contain key insights that are difficult to detect. Even a simple analysis such as variance calculations on these measures can reveal important findings. For example, the GLM research team proposed a gradient shrinkage algorithm that effectively handles loss spikes and NaN losses by analyzing the gradient specification behavior.
Amber
AMBER is the first member of the LLM360 "family". Also released are its fine-tuned versions: AMBERCHAT and AMBERSAFE.
What needs to be rewritten: details of data and model
Table 2 details AMBER’s pre-training data set, which contains 1.26 T markers. These include data preprocessing methods, formats, data mixing ratios, as well as architectural details and specific pretraining hyperparameters of the AMBER model. For detailed information, please refer to the project homepage of the LLM360 code base
##AMBER adopts the same model structure as LLaMA 7B4, Table 3 The detailed structural configuration of LLM is summarized. Training hyperparameters. AMBER is trained using the AdamW optimizer, and the hyperparameters are: β₁=0.9, β₂=0.95. In addition, researchers have released several fine-tuned versions of AMBER: AMBERCHAT and AMBERSAFE. AMBERCHAT is fine-tuned based on WizardLM's instruction training data set. For more parameter details, please refer to the original text
In order to achieve the purpose of not changing the original meaning, the content needs to be rewritten into Chinese. The following is a rewrite of "Experiments and Results":
Conduct experiments and result analysis
#The researchers used four benchmark data sets on the Open LLM rankings to evaluate the performance of AMBER. As shown in Figure 4, in the HellaSwag and ARC data sets, the AMBER score gradually increases during the pre-training period, while in the TruthfulQA data set, the score decreases as training proceeds. In the MMLU dataset, the score of AMBER decreases in the initial stage of pre-training and then starts to increase
in Table 4 , the researchers compared the model performance of AMBER with models trained in similar time periods such as OpenLLaMA, RedPajama-INCITE, Falcon, and MPT. Many models are inspired by LLaMA. It can be found that AMBER scores better on MMLU but performs slightly worse on ARC. AMBER's performance is relatively strong compared to other similar models.
CRYSTALCODER
The second member of LLM360 "Big Family" is CrystalCoder.
CrystalCoder is a 7B language model trained on 1.4 T tokens, achieving a balance between coding and language capabilities. Unlike most previous code LLMs, CrystalCoder is trained on a careful mixture of text and code data to maximize utility in both domains. Compared with Code Llama 2, CrystalCoder's code data is introduced earlier in the pre-training process. In addition, the researchers trained CrystalCoder on Python and web programming languages to improve its usefulness as a programming assistant.
Rebuild the model architecture
CrystalCoder adopts a very similar architecture to LLaMA 7B, adding the maximum update parameter Chemistry (muP). In addition to this specific parameterization, the researchers also made some modifications. In addition, the researchers also used LayerNorm instead of RMSNorm because the CG-1 architecture supports efficient calculation of LayerNorm.
#In order to achieve the purpose of not changing the original meaning, the content needs to be rewritten into Chinese. The following is a rewrite of "Experiments and Results": Conduct experiments and result analysis
#On the Open LLM Leaderboard, the researcher conducted a benchmark test on the model, including four benchmark data sets and a coding benchmark data set. As shown in Figure 6
Referring to Table 5, you can see that CrystalCoder has achieved a good balance between language tasks and code tasks
ANALYSIS360
Based on previous research, in-depth research can be carried out by analyzing the intermediate checkpoints of the model. Researchers hope LLM360 will provide the community with a useful reference and research resource. To this end, they released the initial version of the ANALYSIS360 project, an organized repository of multifaceted analyzes of model behavior, including model characteristics and downstream evaluation results
as a An example of analyzing a series of model checkpoints. The researchers conducted a preliminary study on memoization in LLM. Recent research has shown that LLMs may memorize large portions of training data and that this data can be retrieved with appropriate prompts. Not only does this memoization have problems with leaking private training data, but it can also degrade LLM performance if the training data contains repetitions or specificities. The researchers made all checkpoints and data public so that a comprehensive analysis of memorization throughout the training phase can be performed.
The following is the memorization score method used in this article, which is expressed in length The accuracy of the prompt of k followed by the token of length l. For specific memory score settings, please refer to the original article.
The distribution of memorized scores for 10 selected checkpoints is presented in Figure 7
The researchers grouped the data blocks according to the selected checkpoints and plotted each data block for each checkpoint in Figure 8 Memoized score of the group. They found that AMBER checkpoints memorize the latest data better than the previous data. Furthermore, for each data block, the memoization score decreases slightly after additional training, but then continues to increase.
Figure 9 shows the correlation between sequences in memoization scores and extractable k values. It can be seen that there is a strong correlation between checkpoints.
Summary
The researcher summarized the observations and some implications of AMBER and CRYSTALCODER. They say pre-training is a computationally intensive task that many academic labs or small institutions cannot afford. They hope that LLM360 can provide comprehensive knowledge and let users understand what happens during LLM pre-training without having to do it themselves
Please see the original text for more details
The above is the detailed content of All-round open source with no dead ends, Xingbo team's LLM360 makes large models truly transparent. For more information, please follow other related articles on the PHP Chinese website!

The legal tech revolution is gaining momentum, pushing legal professionals to actively embrace AI solutions. Passive resistance is no longer a viable option for those aiming to stay competitive. Why is Technology Adoption Crucial? Legal professional

Many assume interactions with AI are anonymous, a stark contrast to human communication. However, AI actively profiles users during every chat. Every prompt, every word, is analyzed and categorized. Let's explore this critical aspect of the AI revo

A successful artificial intelligence strategy cannot be separated from strong corporate culture support. As Peter Drucker said, business operations depend on people, and so does the success of artificial intelligence. For organizations that actively embrace artificial intelligence, building a corporate culture that adapts to AI is crucial, and it even determines the success or failure of AI strategies. West Monroe recently released a practical guide to building a thriving AI-friendly corporate culture, and here are some key points: 1. Clarify the success model of AI: First of all, we must have a clear vision of how AI can empower business. An ideal AI operation culture can achieve a natural integration of work processes between humans and AI systems. AI is good at certain tasks, while humans are good at creativity and judgment

Meta upgrades AI assistant application, and the era of wearable AI is coming! The app, designed to compete with ChatGPT, offers standard AI features such as text, voice interaction, image generation and web search, but has now added geolocation capabilities for the first time. This means that Meta AI knows where you are and what you are viewing when answering your question. It uses your interests, location, profile and activity information to provide the latest situational information that was not possible before. The app also supports real-time translation, which completely changed the AI experience on Ray-Ban glasses and greatly improved its usefulness. The imposition of tariffs on foreign films is a naked exercise of power over the media and culture. If implemented, this will accelerate toward AI and virtual production

Artificial intelligence is revolutionizing the field of cybercrime, which forces us to learn new defensive skills. Cyber criminals are increasingly using powerful artificial intelligence technologies such as deep forgery and intelligent cyberattacks to fraud and destruction at an unprecedented scale. It is reported that 87% of global businesses have been targeted for AI cybercrime over the past year. So, how can we avoid becoming victims of this wave of smart crimes? Let’s explore how to identify risks and take protective measures at the individual and organizational level. How cybercriminals use artificial intelligence As technology advances, criminals are constantly looking for new ways to attack individuals, businesses and governments. The widespread use of artificial intelligence may be the latest aspect, but its potential harm is unprecedented. In particular, artificial intelligence

The intricate relationship between artificial intelligence (AI) and human intelligence (NI) is best understood as a feedback loop. Humans create AI, training it on data generated by human activity to enhance or replicate human capabilities. This AI

Anthropic's recent statement, highlighting the lack of understanding surrounding cutting-edge AI models, has sparked a heated debate among experts. Is this opacity a genuine technological crisis, or simply a temporary hurdle on the path to more soph

India is a diverse country with a rich tapestry of languages, making seamless communication across regions a persistent challenge. However, Sarvam’s Bulbul-V2 is helping to bridge this gap with its advanced text-to-speech (TTS) t


Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

Video Face Swap
Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

Dreamweaver Mac version
Visual web development tools

SAP NetWeaver Server Adapter for Eclipse
Integrate Eclipse with SAP NetWeaver application server.

SublimeText3 Chinese version
Chinese version, very easy to use

MantisBT
Mantis is an easy-to-deploy web-based defect tracking tool designed to aid in product defect tracking. It requires PHP, MySQL and a web server. Check out our demo and hosting services.

DVWA
Damn Vulnerable Web App (DVWA) is a PHP/MySQL web application that is very vulnerable. Its main goals are to be an aid for security professionals to test their skills and tools in a legal environment, to help web developers better understand the process of securing web applications, and to help teachers/students teach/learn in a classroom environment Web application security. The goal of DVWA is to practice some of the most common web vulnerabilities through a simple and straightforward interface, with varying degrees of difficulty. Please note that this software
