search
HomeTechnology peripheralsAIHow smooth is the performance of GPT-4? Can human writing be surpassed?

Summary generation is a task of natural language generation (NLG), whose main purpose is to compress long texts into short summaries. It can be applied to a variety of content, such as news articles, source code, and cross-language texts, etc.

With the emergence of large models (LLM), traditional fine-tuning on specific data sets method is no longer applicable.

We can’t help but ask, how effective is LLM in generating summaries?

In order to answer this question, researchers from Peking University conducted a detailed discussion in the paper "Summarization is (Almost) Dead". They evaluated the performance of LLM on various summarization tasks (single news, multiple news, dialogue, source code, and cross-language summarization) using human-generated evaluation datasets. Quantitative and qualitative comparisons of LLM-generated summaries, human-written summaries, and fine-tuned model-generated summaries revealed that LLM-generated summaries were significantly favored by human evaluators

In the past After sampling and examining 100 papers related to summarization methods published in ACL, EMNLP, NAACL and COLING in 3 years, the researchers found that the main contribution of about 70% of the papers was to propose a summary summarization method and use it on standard data Its effectiveness has been verified on the set. Therefore, the study stated that "Summarization is (Almost) Dead"

Despite this, the researchers said that the field still faces some challenges, such as the need for higher Issues such as quality reference data sets and improved evaluation methods still need to be resolved

Paper link: https://arxiv.org/pdf/2309.09558. pdfHow smooth is the performance of GPT-4? Can human writing be surpassed?

Methods and results

This study uses the latest data to construct data sets, each of which consists of 50 samples.

When performing single news, multiple news and conversation summary tasks, we used methods similar to the CNN/DailyMail and Multi-News data set construction methods for simulation. For the cross-language summarization task, we adopt the same strategy as that proposed by Zhu et al. As for the code summary task, the method proposed by Bahrami et al.

After the data set is constructed, the next step is the method. Specifically, this article uses BART and T5 for single news tasks; Pegasus and BART for multiple news tasks; T5 and BART for dialogue tasks; MT5 and MBART for cross-language tasks; and Codet5 for source code tasks.

In this experiment, the study used human evaluators to compare the overall quality of different abstracts. According to the results in Figure 1, the summaries generated by LLM outperform the manually generated summaries and the summaries generated by the fine-tuned model in all tasks

#This raises the question of why LLM is able to outperform human-written summaries, which are traditionally thought to be flawless. Furthermore, preliminary observations indicate that LLM-generated summaries are very smooth and coherent How smooth is the performance of GPT-4? Can human writing be surpassed?

This paper further recruits annotators to identify hallucination issues in human and LLM-generated summary sentences, and the results are shown in Table 1 , human-written summaries exhibit the same or a higher number of hallucinations compared to summaries generated by GPT-4. In specific tasks such as multiple news items and code summarization, human-written summaries exhibit significantly poorer factual consistency.

Table 2 shows the proportion of hallucinations in human-written summaries and GPT-4 generated summariesHow smooth is the performance of GPT-4? Can human writing be surpassed?

This article also found that human-written reference summaries have a problem that lacks fluency. As shown in Figure 2 (a), human-written reference summaries sometimes suffer from incomplete information. And in Figure 2(b), some human-written reference summaries exhibit hallucinations. How smooth is the performance of GPT-4? Can human writing be surpassed?

This study also found that the summaries generated by fine-tuning models usually have a fixed and strict length, while LLM is able to adjust the output length based on input information. Furthermore, when the input contains multiple topics, the summaries generated by the fine-tuned model have low coverage of the topics, as shown in Figure 3, while the LLM is able to capture all topics when generating summaries

How smooth is the performance of GPT-4? Can human writing be surpassed?

According to the results in Figure 4, it can be seen that the human preference score for large models exceeds 50%, which shows that people have a strong preference for its summary and highlights the ability of LLM in text summarization

How smooth is the performance of GPT-4? Can human writing be surpassed?

The above is the detailed content of How smooth is the performance of GPT-4? Can human writing be surpassed?. For more information, please follow other related articles on the PHP Chinese website!

Statement
This article is reproduced at:51CTO.COM. If there is any infringement, please contact admin@php.cn delete
Most Used 10 Power BI Charts - Analytics VidhyaMost Used 10 Power BI Charts - Analytics VidhyaApr 16, 2025 pm 12:05 PM

Harnessing the Power of Data Visualization with Microsoft Power BI Charts In today's data-driven world, effectively communicating complex information to non-technical audiences is crucial. Data visualization bridges this gap, transforming raw data i

Expert Systems in AIExpert Systems in AIApr 16, 2025 pm 12:00 PM

Expert Systems: A Deep Dive into AI's Decision-Making Power Imagine having access to expert advice on anything, from medical diagnoses to financial planning. That's the power of expert systems in artificial intelligence. These systems mimic the pro

Three Of The Best Vibe Coders Break Down This AI Revolution In CodeThree Of The Best Vibe Coders Break Down This AI Revolution In CodeApr 16, 2025 am 11:58 AM

First of all, it’s apparent that this is happening quickly. Various companies are talking about the proportions of their code that are currently written by AI, and these are increasing at a rapid clip. There’s a lot of job displacement already around

Runway AI's Gen-4: How Can AI Montage Go Beyond AbsurdityRunway AI's Gen-4: How Can AI Montage Go Beyond AbsurdityApr 16, 2025 am 11:45 AM

The film industry, alongside all creative sectors, from digital marketing to social media, stands at a technological crossroad. As artificial intelligence begins to reshape every aspect of visual storytelling and change the landscape of entertainment

How to Enroll for 5 Days ISRO AI Free Courses? - Analytics VidhyaHow to Enroll for 5 Days ISRO AI Free Courses? - Analytics VidhyaApr 16, 2025 am 11:43 AM

ISRO's Free AI/ML Online Course: A Gateway to Geospatial Technology Innovation The Indian Space Research Organisation (ISRO), through its Indian Institute of Remote Sensing (IIRS), is offering a fantastic opportunity for students and professionals to

Local Search Algorithms in AILocal Search Algorithms in AIApr 16, 2025 am 11:40 AM

Local Search Algorithms: A Comprehensive Guide Planning a large-scale event requires efficient workload distribution. When traditional approaches fail, local search algorithms offer a powerful solution. This article explores hill climbing and simul

OpenAI Shifts Focus With GPT-4.1, Prioritizes Coding And Cost EfficiencyOpenAI Shifts Focus With GPT-4.1, Prioritizes Coding And Cost EfficiencyApr 16, 2025 am 11:37 AM

The release includes three distinct models, GPT-4.1, GPT-4.1 mini and GPT-4.1 nano, signaling a move toward task-specific optimizations within the large language model landscape. These models are not immediately replacing user-facing interfaces like

The Prompt: ChatGPT Generates Fake PassportsThe Prompt: ChatGPT Generates Fake PassportsApr 16, 2025 am 11:35 AM

Chip giant Nvidia said on Monday it will start manufacturing AI supercomputers— machines that can process copious amounts of data and run complex algorithms— entirely within the U.S. for the first time. The announcement comes after President Trump si

See all articles

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

AI Hentai Generator

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)
4 weeks agoBy尊渡假赌尊渡假赌尊渡假赌
R.E.P.O. Best Graphic Settings
4 weeks agoBy尊渡假赌尊渡假赌尊渡假赌
R.E.P.O. How to Fix Audio if You Can't Hear Anyone
4 weeks agoBy尊渡假赌尊渡假赌尊渡假赌
R.E.P.O. Chat Commands and How to Use Them
4 weeks agoBy尊渡假赌尊渡假赌尊渡假赌

Hot Tools

mPDF

mPDF

mPDF is a PHP library that can generate PDF files from UTF-8 encoded HTML. The original author, Ian Back, wrote mPDF to output PDF files "on the fly" from his website and handle different languages. It is slower than original scripts like HTML2FPDF and produces larger files when using Unicode fonts, but supports CSS styles etc. and has a lot of enhancements. Supports almost all languages, including RTL (Arabic and Hebrew) and CJK (Chinese, Japanese and Korean). Supports nested block-level elements (such as P, DIV),

Dreamweaver Mac version

Dreamweaver Mac version

Visual web development tools

Safe Exam Browser

Safe Exam Browser

Safe Exam Browser is a secure browser environment for taking online exams securely. This software turns any computer into a secure workstation. It controls access to any utility and prevents students from using unauthorized resources.

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

PhpStorm Mac version

PhpStorm Mac version

The latest (2018.2.1) professional PHP integrated development tool