search
HomeTechnology peripheralsAICVPR 2023 paper summary! The hottest field of CV is awarded to multi-modal and diffusion models

The annual CVPR will officially open in Vancouver, Canada from June 18th to 22nd.

Every year, thousands of CV researchers and engineers from around the world gather for the Summit. This prestigious conference dates back to 1983 and represents the pinnacle of computer vision development.

Currently, CVPR’s h5 index ranks fourth among all conferences or publications, second only to Nature, Science and the New England Journal of Medicine.

CVPR 2023论文总结!CV最热领域颁给多模态、扩散模型

Some time ago, CVPR announced the results of paper acceptance. According to statistics on the official website, a total of 9,155 papers were accepted, 2,359 were accepted, and the acceptance rate was 25.8%.

In addition, 12 award-winning candidate papers were announced.

CVPR 2023论文总结!CV最热领域颁给多模态、扩散模型

So, what are the highlights of this year’s CVPR? What trends can we see in the CV field from the accepted papers?

will be announced next.

CVPR at a glance

The startup Voxel51 analyzed the list of all accepted papers.

Let’s first look at a summary diagram of the title of the paper. The size of each word is proportional to the frequency of occurrence in the data set.

CVPR 2023论文总结!CV最热领域颁给多模态、扩散模型

##Brief description

- 2359 articles Papers accepted (9155 papers submitted)

- 1724 Arxiv papers

- 68 papers submitted to other addresses

Authors per paper

-The average author of a CVPR paper is about 5.4 people

- The paper with the most authors is: "Why is the winner the best?" There are 125 authors

- There are 13 papers with only one author.

Main Arxiv classification

Among the 1,724 Arxiv papers, there are 1,545, or close to 90% The paper lists cs.CV as the main category.

cs.LG ranked second with 101 articles. eess.IV (26) and cs.RO (16) also get a share of the pie.

Other categories for CVPR papers include: cs.HC, cs.CV, cs.AR, cs.DC, cs.NE, cs.SD, cs.CL, cs.IT , cs.CR, cs.AI, cs.MM, cs.GR, eess.SP, eess.AS, math.OC, math.NT, physics.data-an and stat.ML.

「Meta」data

- The two words "dataset" and "model" appear together in Among 567 abstracts. “Dataset” appears alone in 265 paper abstracts, while “model” appears alone 613 times. Only 16.2% of papers accepted by CVPR did not contain these two words.

- According to CVPR paper abstracts, the most popular datasets this year are ImageNet (105), COCO (94), KITTI (55) and CIFAR (36).

- 28 papers propose a new "benchmark".

Acronyms abound

It seems like there is no machine learning project without acronyms. Among the 2,359 papers, 1,487 have titles with multiple abbreviations or compound words in capital letters, accounting for 63%.

Some of these acronyms are easy to remember and even roll off the tongue:

##- CLAMP: Prompt-based Contrastive Learning for Connecting Language and Animal PoseCLAMP

- PATS: Patch Area Transportation with Subdivision for Local Feature Matching

- CIRCLE: Capture In Rich Contextual Environments

Some are much more complex:

- SIEDOB: Semantic Image Editing by Disentangling Object and Background

- FJMP : Factorized Joint Multi-Agent Motion Prediction over Learned Directed Acyclic Interaction GraphsFJMP

Some of them seem to have borrowed ideas from others on acronym construction:

- SCOTCH and SODA: A Transformer Video Shadow Detection Framework (Dutch popular brand Scotch & Soda)

- EXCALIBUR: Encouraging and Evaluating Embodied Exploration (Ex Curry sticks, lol)

What’s the hottest?

In addition to the 2023 paper titles, we crawled all accepted paper titles in 2022. From these two lists, we calculated the relative frequency of various keywords to give you a deeper understanding of what is an uptrend and what is a downtrend.

Model

In 2023, diffusion models dominate.

CVPR 2023论文总结!CV最热领域颁给多模态、扩散模型

Diffusion Model

With Stable With the popularity of image generation models such as Diffusion and Midjourney, it is not surprising that the development of diffusion models is a hot trend.

Diffusion models also have applications in denoising, image editing, and style transfer. Add it all up, and it's by far the biggest winner across all categories, up 573% year-over-year.

Radiation Field

Neural Radiation Field (NERF) is also becoming more and more popular, and the word " "radiance" increased by 80%, and "NERF" increased by 39%. NeRF has moved from proof of concept to editing, application and training process optimization.

Transformers

The declining usage of "Transformer" and "ViT" does not mean that the Transformer model is outdated. Rather, it reflects the dominance of these models in 2022. In 2021, the word "Transformer" appeared in only 37 papers. In 2022, this number will soar to 201. Transformers aren't going away anytime soon.

CNN

CNN used to be the darling of computer vision. By 2023, it seems that they have lost their advantage. Usage dropped by 68%. Many headlines mentioning CNN also mention other models. For example, these papers mention CNN and Transformer:

- Lite-Mono: A Lightweight CNN and Transformer Architecture for Self-Supervised Monocular Depth EstimationLite-Mono

- Learned Image Compression with Mixed Transformer-CNN Architectures

Task

The combination of mask task and mask image modeling , occupying a dominant position in CVPR.

CVPR 2023论文总结!CV最热领域颁给多模态、扩散模型

generate

Traditional discriminative tasks such as detection, classification and segmentation have not fallen out of favor, but their share in CV is shrinking due to a series of advances in generative applications, including "editing", "synthesis" and "generation" The rise proves this.

Mask

The keyword "mask" increased by 263% compared with the same period last year and was accepted in 2023 appears 92 times in papers and sometimes 2 times in a title.

- SIM: Semantic-aware Instance Mask Generation for Box-Supervised Instance SegmentationSIM

##- DynaMask: Dynamic Mask Selection for Instance SegmentationDynaMask

But the majority (64%) actually refer to the "mask" task, including 8 "mask image modeling" and 15 "mask autoencoder" tasks. In addition, "mask" appears in 8 articles.

It is also worth noting that the 3 paper titles with the word "mask" actually refer to the "no mask" task.

Zero sample vs small sample

With the rise of transfer learning, generative methods, hints and general models, "Zero-shot" learning is gaining traction. At the same time, “small sample” learning has declined from last year. However, in terms of raw numbers, at least for now, the "small sample" (45) has a slight advantage over the "zero sample" (35).

Modality

In 2023, the development of multi-modal and cross-modal applications will accelerate.

CVPR 2023论文总结!CV最热领域颁给多模态、扩散模型

##Blurred boundaries

Although traditional computers The frequency of visual keywords such as "image" and "video" remains relatively unchanged, but "text"/"language" and "audio" appear more frequently.

Even if the word "multimodal" itself does not appear in the title of the paper, it is difficult to deny that computer vision is heading towards a multimodal future.

This is especially evident in visual-verbal tasks, as shown by the sharp rise in Open, Prompt, and Vocabulary.

The most extreme example of this situation is the compound word "open vocabulary", which only appeared 3 times in 2022, but 18 times in 2023.

CVPR 2023论文总结!CV最热领域颁给多模态、扩散模型

##Deeply dig into the keywords in the CVPR 2023 paper titles

PointCloud9

Three-dimensional computer vision applications are moving from inferring 3D information ("depth" and "stereoscopic") from two-dimensional images to directly on 3D point cloud data The computer vision system that does the work.

Creativity in CV Titles

For each paper uploaded to Arxiv, we scraped the abstract and asked ChatGPT (GPT-3.5 API) to generate a title for the corresponding CVPR paper.

Then, we combine these titles generated by ChatGPT and the actual paper titles, use OpenAI’s text-embedding-ada-002 model to generate embedding vectors, and calculate the sum of the titles generated by ChatGPT Cosine similarity between author-generated titles.

What can this tell us? The closer ChatGPT is to the actual paper title, the more predictable the title will be. In other words, the more "biased" ChatGPT's predictions are, the more "creative" the author is in naming the paper.

Embedding and cosine similarity provide us with an interesting, although far from perfect, method of quantification.

We sorted the papers according to this metric. Without further ado, here are the most creative headlines:

Actual headline: Tracking Every Thing in the Wild

Predicted headline : Disentangling Classification from Tracking: Introducing TETA for Comprehensive Benchmarking of Multi-Category Multiple Object Tracking

Actual title: Learning to Bootstrap for Combating Label Noise

Predicted title: Learnable Loss Objective for Joint Instance and Label Reweighting in Deep Neural Networks

Actual title: Seeing a Rose in Five Thousand Ways

Predicted title: Learning Object Intrinsics from Single Internet Images for Superior Visual Rendering and Synthesis

Actual title: Why is the winner the best?

Predicted title: Analyzing Winning Strategies in International Benchmarking Competitions for Image Analysis: Insights from a Multi-Center Study of IEEE ISBI and MICCAI 2021

The above is the detailed content of CVPR 2023 paper summary! The hottest field of CV is awarded to multi-modal and diffusion models. For more information, please follow other related articles on the PHP Chinese website!

Statement
This article is reproduced at:51CTO.COM. If there is any infringement, please contact admin@php.cn delete
A Comprehensive Guide to ExtrapolationA Comprehensive Guide to ExtrapolationApr 15, 2025 am 11:38 AM

Introduction Suppose there is a farmer who daily observes the progress of crops in several weeks. He looks at the growth rates and begins to ponder about how much more taller his plants could grow in another few weeks. From th

The Rise Of Soft AI And What It Means For Businesses TodayThe Rise Of Soft AI And What It Means For Businesses TodayApr 15, 2025 am 11:36 AM

Soft AI — defined as AI systems designed to perform specific, narrow tasks using approximate reasoning, pattern recognition, and flexible decision-making — seeks to mimic human-like thinking by embracing ambiguity. But what does this mean for busine

Evolving Security Frameworks For The AI FrontierEvolving Security Frameworks For The AI FrontierApr 15, 2025 am 11:34 AM

The answer is clear—just as cloud computing required a shift toward cloud-native security tools, AI demands a new breed of security solutions designed specifically for AI's unique needs. The Rise of Cloud Computing and Security Lessons Learned In th

3 Ways Generative AI Amplifies Entrepreneurs: Beware Of Averages!3 Ways Generative AI Amplifies Entrepreneurs: Beware Of Averages!Apr 15, 2025 am 11:33 AM

Entrepreneurs and using AI and Generative AI to make their businesses better. At the same time, it is important to remember generative AI, like all technologies, is an amplifier – making the good great and the mediocre, worse. A rigorous 2024 study o

New Short Course on Embedding Models by Andrew NgNew Short Course on Embedding Models by Andrew NgApr 15, 2025 am 11:32 AM

Unlock the Power of Embedding Models: A Deep Dive into Andrew Ng's New Course Imagine a future where machines understand and respond to your questions with perfect accuracy. This isn't science fiction; thanks to advancements in AI, it's becoming a r

Is Hallucination in Large Language Models (LLMs) Inevitable?Is Hallucination in Large Language Models (LLMs) Inevitable?Apr 15, 2025 am 11:31 AM

Large Language Models (LLMs) and the Inevitable Problem of Hallucinations You've likely used AI models like ChatGPT, Claude, and Gemini. These are all examples of Large Language Models (LLMs), powerful AI systems trained on massive text datasets to

The 60% Problem — How AI Search Is Draining Your TrafficThe 60% Problem — How AI Search Is Draining Your TrafficApr 15, 2025 am 11:28 AM

Recent research has shown that AI Overviews can cause a whopping 15-64% decline in organic traffic, based on industry and search type. This radical change is causing marketers to reconsider their whole strategy regarding digital visibility. The New

MIT Media Lab To Put Human Flourishing At The Heart Of AI R&DMIT Media Lab To Put Human Flourishing At The Heart Of AI R&DApr 15, 2025 am 11:26 AM

A recent report from Elon University’s Imagining The Digital Future Center surveyed nearly 300 global technology experts. The resulting report, ‘Being Human in 2035’, concluded that most are concerned that the deepening adoption of AI systems over t

See all articles

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

AI Hentai Generator

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)
4 weeks agoBy尊渡假赌尊渡假赌尊渡假赌
R.E.P.O. Best Graphic Settings
4 weeks agoBy尊渡假赌尊渡假赌尊渡假赌
R.E.P.O. How to Fix Audio if You Can't Hear Anyone
4 weeks agoBy尊渡假赌尊渡假赌尊渡假赌
WWE 2K25: How To Unlock Everything In MyRise
1 months agoBy尊渡假赌尊渡假赌尊渡假赌

Hot Tools

MinGW - Minimalist GNU for Windows

MinGW - Minimalist GNU for Windows

This project is in the process of being migrated to osdn.net/projects/mingw, you can continue to follow us there. MinGW: A native Windows port of the GNU Compiler Collection (GCC), freely distributable import libraries and header files for building native Windows applications; includes extensions to the MSVC runtime to support C99 functionality. All MinGW software can run on 64-bit Windows platforms.

mPDF

mPDF

mPDF is a PHP library that can generate PDF files from UTF-8 encoded HTML. The original author, Ian Back, wrote mPDF to output PDF files "on the fly" from his website and handle different languages. It is slower than original scripts like HTML2FPDF and produces larger files when using Unicode fonts, but supports CSS styles etc. and has a lot of enhancements. Supports almost all languages, including RTL (Arabic and Hebrew) and CJK (Chinese, Japanese and Korean). Supports nested block-level elements (such as P, DIV),

Safe Exam Browser

Safe Exam Browser

Safe Exam Browser is a secure browser environment for taking online exams securely. This software turns any computer into a secure workstation. It controls access to any utility and prevents students from using unauthorized resources.

SublimeText3 English version

SublimeText3 English version

Recommended: Win version, supports code prompts!

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)