search
HomeTechnology peripheralsAIFull review of Gemini: From CMU to GPT 3.5 Turbo, Gemini Pro loses

How much does Google’s Gemini weigh? How does it compare to OpenAI’s GPT model? This CMU paper has clear measurement results


Some time ago, Google released a competing product to the OpenAI GPT model - Gemini. This big model comes in three versions – Ultra (the most capable), Pro and Nano. Test results published by the research team show that the Ultra version outperforms GPT4 in many tasks, while the Pro version is on par with GPT-3.5.

Although these comparative results are of great significance to large-scale language model research, the exact evaluation details and model predictions have not yet been made public, which limits the replication and reproducibility of the test results. detection, it is difficult to further analyze its hidden details.

In order to understand the true strength of Gemini, researchers from Carnegie Mellon University and BerriAI conducted an in-depth exploration of the model’s language understanding and generation capabilities.
They tested the text understanding and generation capabilities of Gemini Pro, GPT 3.5 Turbo, GPT 4 Turbo, and Mixtral on ten data sets. Specifically, they tested the model's ability to answer knowledge-based questions on MMLU, the model's reasoning ability on BigBenchHard, the model's ability to answer mathematical questions in data sets such as GSM8K, and the model's ability to answer mathematical questions in data sets such as FLORES. The translation ability of the model; the code generation ability of the model was tested in data sets such as HumanEval; the model's ability as an agent that follows instructions was tested in WebArena.
摸底谷歌Gemini:CMU全面测评,Gemini Pro不敌GPT 3.5 Turbo
Table 1 below shows the main results of the comparison. Overall, as of the publication date of the paper, Gemini Pro is close to OpenAI GPT 3.5 Turbo in accuracy across all tasks, but still slightly inferior. In addition, they also found that Gemini and GPT performed better than the open source competing model Mixtral.
摸底谷歌Gemini:CMU全面测评,Gemini Pro不敌GPT 3.5 Turbo
In the paper, the author provides an in-depth description and analysis of each task. All results and reproducible code can be found at: https://github.com/neulab/gemini-benchmark
摸底谷歌Gemini:CMU全面测评,Gemini Pro不敌GPT 3.5 Turbo
Paper link: https://arxiv.org/ pdf/2312.11444.pdf

Experimental settings

##The author chose Gemini Pro, GPT Four models, 3.5 Turbo, GPT 4 Turbo, and Mixtral, were used as test objects.
摸底谷歌Gemini:CMU全面测评,Gemini Pro不敌GPT 3.5 Turbo
Due to differences in experimental settings during evaluation in previous studies, in order to ensure a fair test, the author re-ran the experiment using exactly the same prompt words and evaluation protocol. In most assessments, they used prompt words and rubrics from a standard repository. These test resources come from the data set that comes with the model release and the evaluation tool Eleuther and so on. Among them, prompt words usually include query, input, a small number of examples, thinking chain reasoning, etc. In some special assessments, the authors found that minor adjustments to standard practices were necessary. Adjusting the bias has been performed in the corresponding code repository, please refer to the original paper.

The goals of this research are as follows:

#1. Through reproducible code and fully transparent results , providing a third-party objective comparison of the capabilities of OpenAI GPT and Google Gemini models.
2. Study the evaluation results in depth and analyze in which fields the two models perform more prominently.

Knowledge-based QA

The author obtained the data from MMLU A centralized selection of 57 knowledge-based multiple-choice question and answer tasks covering various topics such as STEM, humanities and social sciences. MMLU has a total of 14,042 test samples and has been widely used to provide an overall assessment of the knowledge capabilities of large language models.

The author compared and analyzed the overall performance of the four test subjects on MMLU (as shown in the figure below), sub-task performance, and the impact of output length on performance.
摸底谷歌Gemini:CMU全面测评,Gemini Pro不敌GPT 3.5 Turbo
Figure 1: Overall accuracy of each model on MMLU using 5 sample prompts and thought chain prompts.

As you can see from the figure, the accuracy of Gemini Pro is lower than GPT 3.5 Turbo, and much lower than GPT 4 Turbo.When using the thought chain prompt, there is little difference in the performance of each model. The authors speculate that this is due to the fact that MMLU primarily captures knowledge-based question and answer tasks, which may not benefit significantly from stronger reasoning-oriented prompts.

It is worth noting that all questions in MMLU are multiple choice questions with four potential answers A to D arranged in order. The graph below shows the proportion of each answer option selected by each model. You can see from the figure that Gemini's answer distribution is very skewed towards choosing the last option D. This contrasts with the more balanced results given by versions of GPT. This may indicate that Gemini did not receive the extensive instruction adjustments associated with multiple-choice questions, resulting in a bias in the model's answer ranking.
摸底谷歌Gemini:CMU全面测评,Gemini Pro不敌GPT 3.5 Turbo
Figure 2: Proportion of answers to multiple-choice questions predicted by the tested model.

The following figure shows the performance of the tested model on the sub-tasks of the MMLU test set. Gemini Pro performs poorly on most tasks compared to GPT 3.5. Thought chain prompts reduce variance between subtasks.
摸底谷歌Gemini:CMU全面测评,Gemini Pro不敌GPT 3.5 Turbo
Figure 3: Accuracy of the tested model on each subtask.

The author takes a deep dive into the strengths and weaknesses of Gemini Pro. As can be observed in Figure 4, Gemini Pro lags behind GPT 3.5 in the Human Gender (Social Sciences), Formal Logic (Humanities), Elementary Mathematics (STEM), and Professional Medicine (Professional Fields) tasks. The lead is also slim in the two tasks the Gemini Pro is better at.
摸底谷歌Gemini:CMU全面测评,Gemini Pro不敌GPT 3.5 Turbo
Figure 4: Advantages of Gemini Pro and GPT 3.5 on MMLU tasks.

The Gemini Pro's poor performance on specific tasks can be attributed to two reasons. First, there are situations where Gemini cannot return an answer. In most MMLU subtasks, the API response rate exceeds 95%, but the corresponding rates are significantly lower in the two tasks of morality (response rate 85%) and human gender (response rate 28%). This suggests that Gemini's lower performance on some tasks may be due to input content filters. Second, the Gemini Pro performs slightly worse at the basic mathematical reasoning required to solve formal logic and basic math tasks.

The author also analyzed how the output length in the thought chain prompt affects model performance, as shown in Figure 5. In general, more powerful models tend to perform more complex reasoning and therefore output longer answers. The Gemini Pro has a noteworthy advantage over its "opponents": its accuracy is less affected by output length. Gemini Pro even outperforms GPT 3.5 when the output length exceeds 900. However, compared to GPT 4 Turbo, Gemini Pro and GPT 3.5 Turbo rarely output long inference chains.
摸底谷歌Gemini:CMU全面测评,Gemini Pro不敌GPT 3.5 Turbo
Figure 5: Output length analysis of the tested model on MMLU.

General-purpose Reasoning

In the BIG-Bench Hard test set, the author evaluates the general reasoning ability of the subjects. BIG-Bench Hard contains 27 different reasoning tasks such as arithmetic, symbolic and multilingual reasoning, factual knowledge understanding, and more. Most tasks consist of 250 question-answer pairs, with a few tasks having slightly fewer questions.

Figure 6 shows the overall accuracy of the tested model. It can be seen that the accuracy of Gemini Pro is slightly lower than GPT 3.5 Turbo and much lower than GPT 4 Turbo. In comparison, the accuracy of the Mixtral model is much lower.
摸底谷歌Gemini:CMU全面测评,Gemini Pro不敌GPT 3.5 Turbo
Figure 6: Overall accuracy of the tested model on BIG-Bench-Hard.

The author goes into more depth about why Gemini general reasoning performs poorly overall. First, they examined accuracy by question length. As shown in Figure 7, Gemini Pro performs poorly on longer, more complex problems. And the GPT model, especially GPT 4 Turbo, even in very long problems, the regression of GPT 4 Turbo is very small. This shows that it is robust and capable of understanding longer and more complex questions and queries. The robustness of GPT 3.5 Turbo is average. Mixtral performed stably in terms of question length, but had lower overall accuracy.
摸底谷歌Gemini:CMU全面测评,Gemini Pro不敌GPT 3.5 Turbo
Figure 7: Accuracy of the tested model on BIG-Bench-Hard by question length.

The author analyzed whether there are differences in accuracy of the tested models in specific BIG-Bench-Hard tasks. Figure 8 shows which tasks GPT 3.5 Turbo performs better than Gemini Pro.

In the task of "tracking the position of transformed objects", Gemini Pro performed particularly poorly. These tasks involve people exchanging items and tracking who owns something, but Gemini Pro often struggled to keep the order right.
摸底谷歌Gemini:CMU全面测评,Gemini Pro不敌GPT 3.5 Turbo
Figure 8: GPT 3.5 Turbo outperforms Gemini Pro’s BIG-Bench-Hard subtask.

Gemini Pro is inferior to Mixtral in tasks such as arithmetic problems that require multi-step solutions and finding errors in translations.

There are also tasks where Gemini Pro is better than GPT 3.5 Turbo. Figure 9 shows the six tasks where Gemini Pro leads GPT 3.5 Turbo by the largest margin. The tasks are heterogeneous and include those requiring world knowledge (sports_understanding), manipulating symbol stacks (dyck_languages), sorting words alphabetically (word_sorting), and parsing tables (penguins_in_a_table).
摸底谷歌Gemini:CMU全面测评,Gemini Pro不敌GPT 3.5 Turbo
Figure 9: Gemini Pro outperforms GPT 3.5 on the BIG-Bench-Hard subtask.

The author further analyzed the robustness of the tested model in different answer types, as shown in Figure 10. Gemini Pro performed worst in the "Valid/Invalid" answer type, which belongs to the task formal_fallacies. Interestingly, 68.4% of the questions in this task had no response. However, in other answer types (consisting of word_sorting and dyck_language tasks), Gemini Pro outperforms all GPT models and Mixtral. That is, Gemini Pro is particularly good at rearranging words and generating symbols in the correct order. Additionally, for MCQ answers, 4.39% of the questions were blocked from responding by Gemini Pro. The GPT models excel in this area, and the Gemini Pro struggles to compete with them.
摸底谷歌Gemini:CMU全面测评,Gemini Pro不敌GPT 3.5 Turbo
Figure 10: Accuracy of tested models by answer type on BIG-Bench-Hard.

In short, no model seems to be leading the way on a specific task. Therefore, when performing general-purpose inference tasks, it is worth trying both Gemini and GPT models before deciding which model to use.

Mathematical ability

In order to evaluate the mathematical reasoning ability of the tested model, the author chooses Four math problem benchmarks were created:

(1) GSM8K: Primary school math benchmark;
(2) SVAMP: Generated by changing word order questions to check robust reasoning skills;
(3) ASDIV: with different language modes and question types;
(4) MAWPS: Contains arithmetic and algebraic word problems.

The author compared the accuracy of Gemini Pro, GPT 3.5 Turbo, GPT 4 Turbo and Mixtral on four mathematical problem test sets, examining their overall performance, differences Performance under problem complexity and performance under different depths of thinking chains.

Figure 11 presents the overall results. In tasks including GSM8K, SVAMP and ASDIV with different language modes, the accuracy of Gemini Pro is slightly lower than that of GPT 3.5 Turbo and much lower. on GPT 4 Turbo. For the tasks in MAWPS, Gemini Pro is still slightly inferior to the GPT model, although all tested models achieve over 90% accuracy. In this task, GPT 3.5 Turbo narrowly outperforms GPT 4 Turbo. In comparison, the accuracy of the Mixtral model is much lower than the other models.
摸底谷歌Gemini:CMU全面测评,Gemini Pro不敌GPT 3.5 Turbo
Figure 11: Overall accuracy of the tested model in four mathematical reasoning test set tasks.

The robustness of each model to problem length is demonstrated in Figure 12. Similar to the inference tasks in BIG-Bench Hard, the model under test showed reduced accuracy when answering longer questions.GPT 3.5 Turbo performs better than Gemini Pro on shorter questions, but regresses faster, and Gemini Pro is similar to GPT 3.5 Turbo in accuracy on longer questions, but still lags slightly behind.
摸底谷歌Gemini:CMU全面测评,Gemini Pro不敌GPT 3.5 Turbo
# Figure 12: The accuracy of the tested model in generating answers for different question lengths in four mathematical reasoning test set tasks.

Additionally, the authors observed differences in the accuracy of the tested models when the answer required a longer chain of thought. As shown in Figure 13, GPT 4 Turbo is very robust even when using long inference chains, while GPT 3.5 Turbo, Gemini Pro, and Mixtral show limitations when the COT length increases. Through analysis, the authors also found that Gemini Pro outperformed GPT 3.5 Turbo in complex examples with COT lengths over 100, but performed poorly in shorter examples.
摸底谷歌Gemini:CMU全面测评,Gemini Pro不敌GPT 3.5 Turbo
Figure 13: Accuracy of each model on GSM8K under different thinking chain lengths.

Figure 14 shows the accuracy of the tested model in generating answers for different numbers of digits. The authors created three "buckets" based on whether the answer contained 1, 2, or 3 or more digits (except for the MAWPS task, which had no answers with more than two digits). As shown in the figure, GPT 3.5 Turbo appears to be more robust to multi-digit math problems, while Gemini Pro degrades on problems with higher numbers.
摸底谷歌Gemini:CMU全面测评,Gemini Pro不敌GPT 3.5 Turbo
Figure 14: Accuracy of each model in four mathematical reasoning test set tasks when the number of answer digits is different.

Code generation

In this section, the author uses Two code generation datasets—HumanEval and ODEX—were used to examine the coding capabilities of the model. The former tests a model's basic code understanding of a limited set of functions in the Python standard library, and the latter tests a model's ability to use a broader set of libraries across the Python ecosystem. The input for both problems is task instructions written in English (usually with test cases). These questions are used to evaluate the model's language understanding, algorithm understanding, and elementary mathematics ability. In total, HumanEval has 164 test samples and ODEX has 439 test samples.

First of all, from the overall results shown in Figure 15, we can see that Gemini Pro’s Pass@1 scores on both tasks are lower than GPT 3.5 Turbo, also Much lower than GPT 4 Turbo. These results indicate that Gemini's code generation capabilities leave room for improvement.
摸底谷歌Gemini:CMU全面测评,Gemini Pro不敌GPT 3.5 Turbo
Figure 15: Overall accuracy of each model in the code generation task.

Secondly, the author analyzes the relationship between the gold solution length and model performance in Figure 16 (a). The length of the solution can, to a certain extent, indicate the difficulty of the corresponding code generation task. The authors find that Gemini Pro achieves Pass@1 scores comparable to GPT 3.5 when the solution length is below 100 (as in the easier case), but it lags behind significantly when the solution length gets longer. This is an interesting contrast to the results of the previous sections, where the authors found that Gemini Pro was generally robust to longer inputs and outputs in English tasks.
摸底谷歌Gemini:CMU全面测评,Gemini Pro不敌GPT 3.5 Turbo
The authors also analyzed the impact of the libraries required for each solution on model performance in Figure 16(b). In most library use cases, such as mocks, pandas, numpy, and datetime, Gemini Pro performs worse than GPT 3.5. However, in matplotlib's use case, it outperforms GPT 3.5 and GPT 4, indicating its greater ability to perform plot visualizations through code.

Finally, the author shows several specific failure cases where Gemini Pro performed worse than GPT 3.5 in terms of code generation. First, they noticed that Gemini was slightly inferior at correctly selecting functions and parameters in the Python API.For example, given the following prompt:
摸底谷歌Gemini:CMU全面测评,Gemini Pro不敌GPT 3.5 Turbo
##Gemini Pro generated the following code, resulting in a type mismatch error:
摸底谷歌Gemini:CMU全面测评,Gemini Pro不敌GPT 3.5 Turbo
In contrast, GPT 3.5 Turbo uses the following code, which achieves the desired effect:
摸底谷歌Gemini:CMU全面测评,Gemini Pro不敌GPT 3.5 Turbo
In addition, Gemini Pro has a higher error ratio, in this case , the code executed is syntactically correct, but does not correctly match the more complex intent. For example, regarding the following tip:
摸底谷歌Gemini:CMU全面测评,Gemini Pro不敌GPT 3.5 Turbo
Gemini Pro has created an implementation that only extracts unique numbers without removing those that appear multiple times.
摸底谷歌Gemini:CMU全面测评,Gemini Pro不敌GPT 3.5 Turbo
Machine Translation

This set of experiments was evaluated using the FLORES-200 machine translation benchmark. The multilingual capabilities of the model, especially the ability to translate between various language pairs. The authors focus on a different subset of the 20 languages ​​used in Robinson et al.'s (2023) analysis, covering varying levels of resource availability and translation difficulty. The authors evaluated 1012 sentences in the test set for all selected language pairs.

In Tables 4 and 5, the author conducts a comparative analysis of Gemini Pro, GPT 3.5 Turbo, and GPT 4 Turbo with mature systems such as Google Translate. Additionally, they benchmarked NLLB-MoE, a leading open source machine translation model known for its broad language coverage. The results show that Google Translate outperforms the other models overall, performing well on 9 languages, followed by NLLB, which performs well on 6/8 languages ​​under the 0/5-shot setting. General-purpose language models have shown competitive performance but have not yet surpassed specialized machine translation systems in translation into non-English languages.
摸底谷歌Gemini:CMU全面测评,Gemini Pro不敌GPT 3.5 Turbo
Table 4: Performance (chRF (%) score) of each model for machine translation across all languages ​​using 0-shot hints. The best score is shown in bold and the next best score is underlined.
摸底谷歌Gemini:CMU全面测评,Gemini Pro不敌GPT 3.5 Turbo
Table 5: Performance of each model in machine translation for all languages ​​using 5-shot prompts (chRF (%) Fraction). The best score is shown in bold and the next best score is underlined.

Figure 17 shows the performance comparison of the general language model across different language pairs. GPT 4 Turbo exhibits consistent performance bias with NLLB compared to GPT 3.5 Turbo and Gemini Pro. GPT 4 Turbo also has larger improvements in low-resource languages, while in high-resource languages, the performance of both LLMs is similar. In comparison, Gemini Pro outperformed GPT 3.5 Turbo and GPT 4 Turbo on 8 out of 20 languages, and achieved top performance on 4 languages. However, Gemini Pro showed a strong tendency to block responses in about 10 language pairs.
摸底谷歌Gemini:CMU全面测评,Gemini Pro不敌GPT 3.5 Turbo
Figure 17: Machine translation performance (chRF (%) score) by language pair.

Figure 18 shows that Gemini Pro has lower performance in these languages ​​because it tends to perform in lower confidence scenarios Block response. If Gemini Pro generates a "Blocked Response" error in a 0-shot or 5-shot configuration, the response is considered "blocked."
摸底谷歌Gemini:CMU全面测评,Gemini Pro不敌GPT 3.5 Turbo
Figure 18: Number of samples blocked by Gemini Pro.

#If you look closely at Figure 19, you can see that Gemini Pro is slightly better than GPT 3.5 Turbo and GPT 4 Turbo in the unshielded samples with higher confidence. .Specifically, it outperforms GPT 4 Turbo by 1.6 chrf and 2.6 chrf at 5-shot and 0-shot settings respectively, and outperforms GPT 3.5 Turbo by 2.7 chrf and 2 chrf. However, the authors' preliminary analysis of the performance of GPT 4 Turbo and GPT 3.5 Turbo on these samples shows that translation of these samples is generally more challenging. Gemini Pro performs poorly on these particular samples, notably as the Gemini Pro 0-shot masks responses while the 5-shot does not, and vice versa.
摸底谷歌Gemini:CMU全面测评,Gemini Pro不敌GPT 3.5 Turbo
Figure 19: chrf performance (%) for masked and unmasked samples.

Throughout the analysis of the model, the author observed that few-shot hints generally modestly improve average performance, with their variance pattern in order Incremental: GPT 4 Turbo

#Figure 20 shows clear trends by language family or script. An important observation is that the Gemini Pro performs competitively with other models on Cyrillic script, but not as well on other scripts. GPT-4 performs outstandingly on various scripts, outperforming other models, among which few-shot hints are particularly effective. This effect is particularly evident in languages ​​using Sanskrit.
摸底谷歌Gemini:CMU全面测评,Gemini Pro不敌GPT 3.5 Turbo
Figure 20: Performance of each model on different scripts (chrf (%)).

Web Agent

Finally, the author examined The ability of each model to act as a network-navigating agent, a task that requires long-term planning and complex data understanding. They used a simulation environment, WebArena, where success was measured by execution results. Tasks assigned to the agent include information search, website navigation, and content and configuration manipulation. Tasks span a variety of websites, including e-commerce platforms, social forums, collaborative software development platforms (such as gitlab), content management systems, and online maps.

The authors tested Gemini-Pro's overall success rate, success rate on different tasks, response length, trajectory steps, and tendency to predict task failure. Table 6 lists the overall performance. The performance of Gemini-Pro is close to, but slightly inferior to,GPT-3.5-Turbo. Similar to GPT-3.5-Turbo, Gemini-Pro performs better when the hint mentions that the task may not be completed (UA hint). With UA hint, Gemini-Pro has an overall success rate of 7.09%.
摸底谷歌Gemini:CMU全面测评,Gemini Pro不敌GPT 3.5 Turbo
Table 6: Performance of each model on WebArena.

If you break it down by website type, as shown in Figure 21, you can see that Gemini-Pro performs worse than GPT on gitlab and map -3.5-Turbo, while the performance on shopping management, reddit and shopping websites is close to GPT-3.5-Turbo. Gemini-Pro outperforms GPT-3.5-Turbo on multi-site tasks, which is consistent with previous results showing that Gemini performs slightly better on more complex subtasks across various benchmarks.
摸底谷歌Gemini:CMU全面测评,Gemini Pro不敌GPT 3.5 Turbo
Figure 21: Web agent success rate of the model on different types of websites.

As shown in Figure 22, in general, Gemini-Pro predicts more tasks as impossible to complete, especially when given a In the case of UA hint. Gemini-Pro predicted that more than 80.6% of the tasks could not be completed given the UA hint, while GPT-3.5-Turbo only predicted 47.7%. It is important to note that only 4.4% of the tasks in the dataset are actually unachievable, so both vastly overestimate the actual number of unachievable tasks.
摸底谷歌Gemini:CMU全面测评,Gemini Pro不敌GPT 3.5 Turbo
Figure 22: UA forecast quantity.

At the same time, the authors observed that Gemini Pro was more likely to respond with shorter phrases, taking fewer steps before reaching a conclusion. step. As shown in Figure 23(a), Gemini Pro has more than half of its trajectories with less than 10 steps, while most trajectories of GPT 3.5 Turbo and GPT 4 Turbo are between 10 and 30 steps.Likewise, most of Gemini's replies are less than 100 characters long, while most of GPT 3.5 Turbo, GPT 4 Turbo, and Mixtral's replies are more than 300 characters long (Figure 23(b)). Gemini tends to predict actions directly, while other models reason first and then give action predictions.
摸底谷歌Gemini:CMU全面测评,Gemini Pro不敌GPT 3.5 Turbo
Figure 23: Model behavior on WebArena.

Please refer to the original paper for more details.

The above is the detailed content of Full review of Gemini: From CMU to GPT 3.5 Turbo, Gemini Pro loses. For more information, please follow other related articles on the PHP Chinese website!

Statement
This article is reproduced at:机器之心. If there is any infringement, please contact admin@php.cn delete
DSA如何弯道超车NVIDIA GPU?DSA如何弯道超车NVIDIA GPU?Sep 20, 2023 pm 06:09 PM

你可能听过以下犀利的观点:1.跟着NVIDIA的技术路线,可能永远也追不上NVIDIA的脚步。2.DSA或许有机会追赶上NVIDIA,但目前的状况是DSA濒临消亡,看不到任何希望另一方面,我们都知道现在大模型正处于风口位置,业界很多人想做大模型芯片,也有很多人想投大模型芯片。但是,大模型芯片的设计关键在哪,大带宽大内存的重要性好像大家都知道,但做出来的芯片跟NVIDIA相比,又有何不同?带着问题,本文尝试给大家一点启发。纯粹以观点为主的文章往往显得形式主义,我们可以通过一个架构的例子来说明Sam

阿里云通义千问14B模型开源!性能超越Llama2等同等尺寸模型阿里云通义千问14B模型开源!性能超越Llama2等同等尺寸模型Sep 25, 2023 pm 10:25 PM

2021年9月25日,阿里云发布了开源项目通义千问140亿参数模型Qwen-14B以及其对话模型Qwen-14B-Chat,并且可以免费商用。Qwen-14B在多个权威评测中表现出色,超过了同等规模的模型,甚至有些指标接近Llama2-70B。此前,阿里云还开源了70亿参数模型Qwen-7B,仅一个多月的时间下载量就突破了100万,成为开源社区的热门项目Qwen-14B是一款支持多种语言的高性能开源模型,相比同类模型使用了更多的高质量数据,整体训练数据超过3万亿Token,使得模型具备更强大的推

ICCV 2023揭晓:ControlNet、SAM等热门论文斩获奖项ICCV 2023揭晓:ControlNet、SAM等热门论文斩获奖项Oct 04, 2023 pm 09:37 PM

在法国巴黎举行了国际计算机视觉大会ICCV(InternationalConferenceonComputerVision)本周开幕作为全球计算机视觉领域顶级的学术会议,ICCV每两年召开一次。ICCV的热度一直以来都与CVPR不相上下,屡创新高在今天的开幕式上,ICCV官方公布了今年的论文数据:本届ICCV共有8068篇投稿,其中有2160篇被接收,录用率为26.8%,略高于上一届ICCV2021的录用率25.9%在论文主题方面,官方也公布了相关数据:多视角和传感器的3D技术热度最高在今天的开

复旦大学团队发布中文智慧法律系统DISC-LawLLM,构建司法评测基准,开源30万微调数据复旦大学团队发布中文智慧法律系统DISC-LawLLM,构建司法评测基准,开源30万微调数据Sep 29, 2023 pm 01:17 PM

随着智慧司法的兴起,智能化方法驱动的智能法律系统有望惠及不同群体。例如,为法律专业人员减轻文书工作,为普通民众提供法律咨询服务,为法学学生提供学习和考试辅导。由于法律知识的独特性和司法任务的多样性,此前的智慧司法研究方面主要着眼于为特定任务设计自动化算法,难以满足对司法领域提供支撑性服务的需求,离应用落地有不小的距离。而大型语言模型(LLMs)在不同的传统任务上展示出强大的能力,为智能法律系统的进一步发展带来希望。近日,复旦大学数据智能与社会计算实验室(FudanDISC)发布大语言模型驱动的中

AI技术在蚂蚁集团保险业务中的应用:革新保险服务,带来全新体验AI技术在蚂蚁集团保险业务中的应用:革新保险服务,带来全新体验Sep 20, 2023 pm 10:45 PM

保险行业对于社会民生和国民经济的重要性不言而喻。作为风险管理工具,保险为人民群众提供保障和福利,推动经济的稳定和可持续发展。在新的时代背景下,保险行业面临着新的机遇和挑战,需要不断创新和转型,以适应社会需求的变化和经济结构的调整近年来,中国的保险科技蓬勃发展。通过创新的商业模式和先进的技术手段,积极推动保险行业实现数字化和智能化转型。保险科技的目标是提升保险服务的便利性、个性化和智能化水平,以前所未有的速度改变传统保险业的面貌。这一发展趋势为保险行业注入了新的活力,使保险产品更贴近人民群众的实际

百度文心一言全面向全社会开放,率先迈出重要一步百度文心一言全面向全社会开放,率先迈出重要一步Aug 31, 2023 pm 01:33 PM

8月31日,文心一言首次向全社会全面开放。用户可以在应用商店下载“文心一言APP”或登录“文心一言官网”(https://yiyan.baidu.com)进行体验据报道,百度计划推出一系列经过全新重构的AI原生应用,以便让用户充分体验生成式AI的理解、生成、逻辑和记忆等四大核心能力今年3月16日,文心一言开启邀测。作为全球大厂中首个发布的生成式AI产品,文心一言的基础模型文心大模型早在2019年就在国内率先发布,近期升级的文心大模型3.5也持续在十余个国内外权威测评中位居第一。李彦宏表示,当文心

致敬TempleOS,有开发者创建了启动Llama 2的操作系统,网友:8G内存老电脑就能跑致敬TempleOS,有开发者创建了启动Llama 2的操作系统,网友:8G内存老电脑就能跑Oct 07, 2023 pm 10:09 PM

不得不说,Llama2的「二创」项目越来越硬核、有趣了。自Meta发布开源大模型Llama2以来,围绕着该模型的「二创」项目便多了起来。此前7月,特斯拉前AI总监、重回OpenAI的AndrejKarpathy利用周末时间,做了一个关于Llama2的有趣项目llama2.c,让用户在PyTorch中训练一个babyLlama2模型,然后使用近500行纯C、无任何依赖性的文件进行推理。今天,在Karpathyllama2.c项目的基础上,又有开发者创建了一个启动Llama2的演示操作系统,以及一个

快手黑科技“子弹时间”赋能亚运转播,打造智慧观赛新体验快手黑科技“子弹时间”赋能亚运转播,打造智慧观赛新体验Oct 11, 2023 am 11:21 AM

杭州第19届亚运会不仅是国际顶级体育盛会,更是一场精彩绝伦的中国科技盛宴。本届亚运会中,快手StreamLake与杭州电信深度合作,联合打造智慧观赛新体验,在击剑赛事的转播中,全面应用了快手StreamLake六自由度技术,其中“子弹时间”也是首次应用于击剑项目国际顶级赛事。中国电信杭州分公司智能亚运专班组长芮杰表示,依托快手StreamLake自研的4K3D虚拟运镜视频技术和中国电信5G/全光网,通过赛场内部署的4K专业摄像机阵列实时采集的高清竞赛视频,

See all articles

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

AI Hentai Generator

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)
2 weeks agoBy尊渡假赌尊渡假赌尊渡假赌
Repo: How To Revive Teammates
1 months agoBy尊渡假赌尊渡假赌尊渡假赌
Hello Kitty Island Adventure: How To Get Giant Seeds
4 weeks agoBy尊渡假赌尊渡假赌尊渡假赌

Hot Tools

Safe Exam Browser

Safe Exam Browser

Safe Exam Browser is a secure browser environment for taking online exams securely. This software turns any computer into a secure workstation. It controls access to any utility and prevents students from using unauthorized resources.

PhpStorm Mac version

PhpStorm Mac version

The latest (2018.2.1) professional PHP integrated development tool

ZendStudio 13.5.1 Mac

ZendStudio 13.5.1 Mac

Powerful PHP integrated development environment

SublimeText3 Linux new version

SublimeText3 Linux new version

SublimeText3 Linux latest version

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor