


2022 can be said to be the first year of generative AI. Recently, Yu Shilun’s team published a comprehensive survey on AIGC, introducing the development history from GAN to ChatGPT.
The year 2022 that has just passed is undoubtedly the singular point of the explosion of generative AI.
Since 2021, generative AI has been selected into Gartner's "Artificial Intelligence Technology Hype Cycle" for two consecutive years and is considered an important AI technology trend in the future.
Recently, Yu Shilun’s team published a comprehensive survey on AIGC, introducing the development history from GAN to ChatGPT.
Paper address: https://arxiv.org/pdf/2303.04226.pdf
This article excerpts part of the paper for introduction.
The singularity has arrived?
In recent years, artificial intelligence-generated content (AIGC, also known as generative AI) has attracted widespread attention outside the computer science community.
The entire society has begun to take great interest in various content generation products developed by large technology companies, such as ChatGPT and DALL-E-2.
AIGC refers to the use of generative artificial intelligence (GAI) technology to generate content and can automatically create a large amount of content in a short time.
ChatGPT is an AI system developed by OpenAI for building conversations. The system is able to effectively understand and respond to human language in a meaningful way.
In addition, DALL-E-2 is another state-of-the-art GAI model developed by OpenAI, capable of creating unique high-quality images from text descriptions in minutes.
Example of AIGC in image generation
Technically speaking, AIGC refers to given instructions that can guide the model to complete the task, using GAI to generate satisfying The content of the instruction. This generation process usually consists of two steps: extracting intent information from instructions, and generating content based on the extracted intent.
However, as previous research has proven, the paradigm of the GAI model including the above two steps is not completely novel.
Compared with previous work, the core point of recent AIGC advancements is to train more complex generative models on larger data sets, use larger base model frameworks, and have access to a wide range of computing resources.
For example, the main framework of GPT-3 is the same as GPT-2, but the pre-training data size increases from WebText (38GB) to CommonCrawl (570GB after filtering), and the basic model size increases from 1.5B to 175B.
Therefore, GPT-3 has better generalization ability than GPT-2 on various tasks.
In addition to the benefits of increased data volumes and computing power, researchers are also exploring ways to combine new technologies with GAI algorithms.
For example, ChatGPT utilizes reinforcement learning with human feedback (RLHF) to determine the most appropriate response to a given instruction, thereby improving the model’s reliability and accuracy over time. This approach enables ChatGPT to better understand human preferences in long conversations.
At the same time, in CV, Stable Diffusion proposed by Stability AI in 2022 has also achieved great success in image generation.
Unlike previous methods, generative diffusion models can help generate high-resolution images by controlling the balance between exploration and exploitation, thereby achieving diversity in the generated images, harmony with the similarity of the training data combination.
By combining these advances, the model has made significant progress in AIGC's mission and has been adopted by industries as diverse as art, advertising, and education.
In the near future, AIGC will continue to become an important area of machine learning research.
Generally speaking, GAI models can be divided into two types: single-modal model and multi-modal model
Therefore, conduct a comprehensive review of past research and find out this Problems in the field are crucial. This is the first survey focusing on core technologies and applications in the AIGC field.
This is the first comprehensive survey of AIGC summarizing GAI in terms of technology and applications.
Previous surveys mainly introduced GAI from different perspectives, including natural language generation, image generation, and multi-modal machine learning generation. However, these previous works only focused on specific parts of AIGC.
In this survey, we first reviewed the basic technologies commonly used in AIGC. Then, a comprehensive summary of advanced GAI algorithms is further provided, including unimodal and multimodal generation. Additionally, the paper examines the applications and potential challenges of AIGC.
Finally, the future direction of this field is emphasized. In summary, the main contributions of this paper are as follows: - To the best of our knowledge, we are the first to provide a formal definition and comprehensive survey of AIGC and AI-augmented generative processes.
-We reviewed the history and basic technology of AIGC, and conducted a comprehensive analysis of the latest progress in GAI tasks and models from the perspectives of unimodal generation and multimodal generation.
-This article discusses the main challenges facing AIGC and future research trends.
Generative AI History
Generative models have a long history in artificial intelligence, dating back to the development of Hidden Markov Models (HMMs) and Gaussian Mixture Models (GMMs) in the 1950s.
These models generate continuous data such as speech and time series. However, it was not until the advent of deep learning that the performance of generative models improved significantly.
In early deep generative models, different domains usually did not overlap much.
The development history of generative AI in CV, NLP and VL
In NLP, the traditional method of generating sentences is to use N-gram language model learning distribution of words, and then search for the best sequence. However, this method cannot effectively adapt to long sentences.
To solve this problem, Recurrent Neural Networks (RNNs) were later introduced to language modeling tasks, allowing relatively long dependencies to be modeled.
The second is the development of long short-term memory (LSTM) and gated recurrent units (GRU), which use gating mechanisms to control memory during training. These methods are able to handle approximately 200 tokens in a sample, which marks a significant improvement compared to N-gram language models.
At the same time, in CV, before the emergence of deep learning-based methods, traditional image generation algorithms used techniques such as texture synthesis (PTS) and texture mapping.
These algorithms are based on hand-designed features and have limited capabilities in generating complex and diverse images.
In 2014, Generative Adversarial Networks (GANs) were first proposed and became a milestone in the field of artificial intelligence because of their impressive results in various applications.
Variant autoencoders (VAEs) and other methods, such as generative diffusion models, have also been developed to provide more fine-grained control over the image generation process and enable the generation of high-quality images.
The development of generative models in different fields has followed different paths, but eventually an intersection emerged: the Transformer architecture.
In 2017, Transformer was introduced in NLP tasks by Vaswani et al., and was later applied to CV, and then became the dominant architecture for many generative models in various fields.
In the field of NLP, many well-known large-scale language models, such as BERT and GPT, adopt the Transformer architecture as their main building block. Advantages compared to previous building blocks, namely LSTM and GRU.
In CV, Vision Transformer (ViT) and Swin Transformer later developed this concept further, combining the Transformer architecture with a vision component, enabling it to be applied to image-based downlink systems.
In addition to the improvements brought by Transformer to a single modality, this crossover also enables models from different fields to be fused together to perform multi-modal tasks.
An example of a multimodal model is CLIP. CLIP is a joint visual language model. It combines the Transformer architecture with a visual component, allowing training on large amounts of text and image data.
Due to combining visual and linguistic knowledge in pre-training, CLIP can also be used as an image encoder in multi-modal cue generation. In short, the emergence of Transformer-based models has revolutionized the generation of artificial intelligence and led to the possibility of large-scale training.
In recent years, researchers have also begun to introduce new technologies based on these models.
For example, in NLP, in order to help the model better understand task requirements, people sometimes prefer few-shot hints. It refers to including in the prompt some examples selected from the dataset.
In visual languages, researchers combine pattern-specific models with self-supervised contrastive learning goals to provide more powerful representations.
In the future, as AIGC becomes more and more important, more and more technologies will be introduced, which will give this field great vitality.
AIGC Basics
This section introduces the commonly used basic models of AIGC.
Basic Model
Transformer
Transformer is the backbone architecture of many state-of-the-art models, such as GPT-3, DALL-E-2, Codex and Gopher.
It was first proposed to solve the limitations of traditional models, such as RNNs, in processing variable-length sequences and context awareness.
The architecture of Transformer is mainly based on a self-attention mechanism, which enables the model to pay attention to different parts of the input sequence.
Transformer consists of an encoder and a decoder. The encoder receives an input sequence and generates a hidden representation, while the decoder receives a hidden representation and generates an output sequence.
Each layer of the encoder and decoder consists of a multi-head attention and a feed-forward neural network. Multi-head attention is the core component of Transformer, which learns to assign different weights based on the relevance of tags.
This information routing approach enables the model to better handle long-term dependencies and, therefore, improves performance in a wide range of NLP tasks.
Another advantage of Transformer is that its architecture makes it highly parallel and allows the data to overcome inductive bias. This feature makes Transformer very suitable for large-scale pre-training, allowing Transformer-based models to adapt to different downstream tasks.
Pre-trained language model
Since the introduction of the Transformer architecture, it has become a mainstream choice for natural language processing due to its parallelism and learning capabilities.
Generally speaking, these Transformer-based pre-trained language models can usually be divided into two categories according to their training tasks: autoregressive language models, and mask language models.
Given a sentence consisting of multiple tokens, the goal of masked language modeling, such as BERT and RoBERTa, is to predict the probability of the masked token given contextual information.
The most notable example of a masked language model is BERT, which includes masked language modeling and next sentence prediction tasks. RoBERTa uses the same architecture as BERT, improving its performance by increasing the amount of pre-training data and incorporating more challenging pre-training objectives.
XL-Net is also based on BERT, which incorporates permutation operations to change the order of predictions for each training iteration, enabling the model to learn more cross-label information.
Autoregressive language models, such as GPT-3 and OPT, model the probability given the previous token, and are therefore left-to-right language models. Unlike masked language models, autoregressive language models are more suitable for generative tasks.
Reinforcement Learning from Human Feedback
Despite being trained on large-scale data, AIGC may not always output content consistent with user intent.
To make AIGC output better match human preferences, reinforcement learning from human feedback (RLHF) has been applied to model fine-tuning in various applications, such as Sparrow, InstructGPT, and ChatGPT.
Normally, the entire process of RLHF includes the following three steps: pre-training, reward learning and fine-tuning of reinforcement learning.
Computing
Hardware
In recent years, hardware technology has made significant progress, facilitating the training of large models.
In the past, training a large neural network using a CPU could take days or even weeks. However, with the increase in computing power, this process has been accelerated by several orders of magnitude.
For example, NVIDIA’s NVIDIA A100 GPU is 7 times faster than V100 and 11 times faster than T4 in BERT large-scale inference process.
In addition, Google’s Tensor Processing Unit (TPU) is designed for deep learning and provides higher computing performance compared to the A100 GPU.
The accelerated advancement of computing power has significantly improved the efficiency of artificial intelligence model training, providing new possibilities for the development of large and complex models.
Distributed training
Another major improvement is distributed training.
In traditional machine learning, training is usually performed on a machine using a single processor. This approach works well for small datasets and models, but becomes impractical when dealing with large datasets and complex models.
In distributed training, the training tasks are distributed to multiple processors or machines, which greatly improves the training speed of the model.
Some companies have also released frameworks that simplify the distributed training process of deep learning stacks. These frameworks provide tools and APIs that allow developers to easily distribute training tasks across multiple processors or machines without having to manage the underlying infrastructure.
Cloud computing
Cloud computing also plays a vital role in training large models. Previously, models were often trained locally. Now, with cloud computing services such as AWS and Azure providing access to powerful computing resources, deep learning researchers and practitioners can create large GPU or TPU clusters required for large model training on demand.
Collectively, these advances make it possible to develop more complex and accurate models, opening up new possibilities in various areas of artificial intelligence research and applications.
Introduction to the author
Philip S. Yu is a scholar in the field of computer science, an ACM/IEEE Fellow, and a distinguished professor in the Department of Computer Science at the University of Illinois at Chicago (UIC).
He has made world-renowned achievements in the theory and technology of big data mining and management. In response to the challenges of big data in terms of scale, speed and diversity, he has proposed effective and cutting-edge solutions in data mining and management methods and technologies, especially in integrating diverse data, mining data streams, frequent patterns, and subspaces. He made groundbreaking contributions to graphs.
He also made pioneering contributions in the field of parallel and distributed database processing technology, and applied it to the IBM S/390 Parallel Sysplex system, successfully integrating traditional IBM mainframe Transition to parallel microprocessor architecture.
The above is the detailed content of 30 page paper! New work by Yu Shilun's team: AIGC comprehensive survey, development history from GAN to ChatGPT. For more information, please follow other related articles on the PHP Chinese website!

自从 ChatGPT、Stable Diffusion 发布以来,各种相关开源项目百花齐放,着实让人应接不暇。今天,着重挑选几个优质的开源项目分享给大家,对我们的日常工作、学习生活,都会有很大的帮助。

Word文档拆分后的子文档字体格式变了的解决办法:1、在大纲模式拆分文档前,先选中正文内容创建一个新的样式,给样式取一个与众不同的名字;2、选中第二段正文内容,通过选择相似文本的功能将剩余正文内容全部设置为新建样式格式;3、进入大纲模式进行文档拆分,操作完成后打开子文档,正文字体格式就是拆分前新建的样式内容。

用 ChatGPT 辅助写论文这件事,越来越靠谱了。 ChatGPT 发布以来,各个领域的从业者都在探索 ChatGPT 的应用前景,挖掘它的潜力。其中,学术文本的理解与编辑是一种极具挑战性的应用场景,因为学术文本需要较高的专业性、严谨性等,有时还需要处理公式、代码、图谱等特殊的内容格式。现在,一个名为「ChatGPT 学术优化(chatgpt_academic)」的新项目在 GitHub 上爆火,上线几天就在 GitHub 上狂揽上万 Star。项目地址:https://github.com/

面对一夜爆火的 ChatGPT ,我最终也没抵得住诱惑,决定体验一下,不过这玩意要注册需要外国手机号以及科学上网,将许多人拦在门外,本篇博客将体验当下爆火的 ChatGPT 以及无需注册和科学上网,拿来即用的 ChatGPT 使用攻略,快来试试吧!

阅读论文可以说是我们的日常工作之一,论文的数量太多,我们如何快速阅读归纳呢?自从ChatGPT出现以后,有很多阅读论文的服务可以使用。其实使用ChatGPT API非常简单,我们只用30行python代码就可以在本地搭建一个自己的应用。 阅读论文可以说是我们的日常工作之一,论文的数量太多,我们如何快速阅读归纳呢?自从ChatGPT出现以后,有很多阅读论文的服务可以使用。其实使用ChatGPT API非常简单,我们只用30行python代码就可以在本地搭建一个自己的应用。使用 Python 和 C

ChatGPT可以联网后,OpenAI还火速介绍了一款代码生成器,在这个插件的加持下,ChatGPT甚至可以自己生成机器学习模型了。 上周五,OpenAI刚刚宣布了惊爆的消息,ChatGPT可以联网,接入第三方插件了!而除了第三方插件,OpenAI也介绍了一款自家的插件「代码解释器」,并给出了几个特别的用例:解决定量和定性的数学问题;进行数据分析和可视化;快速转换文件格式。此外,Greg Brockman演示了ChatGPT还可以对上传视频文件进行处理。而一位叫Andrew Mayne的畅销作

本篇文章给大家带来了关于php的相关知识,其中主要介绍了我是怎么用ChatGPT学习PHP中AOP的实现,感兴趣的朋友下面一起来看一下吧,希望对大家有帮助。


Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

AI Hentai Generator
Generate AI Hentai for free.

Hot Article

Hot Tools

Dreamweaver Mac version
Visual web development tools

MantisBT
Mantis is an easy-to-deploy web-based defect tracking tool designed to aid in product defect tracking. It requires PHP, MySQL and a web server. Check out our demo and hosting services.

Notepad++7.3.1
Easy-to-use and free code editor

SAP NetWeaver Server Adapter for Eclipse
Integrate Eclipse with SAP NetWeaver application server.

SublimeText3 Mac version
God-level code editing software (SublimeText3)
