search
HomeTechnology peripheralsAISubvert three concepts! Google's latest research: Is it more accurate to calculate 'similarity' with a poor-performance model?

CalculatingThe similarity between images is an open problem in computer vision.

Today, when image generation is popular all over the world, How to define "similarity" is also a key issue in evaluating the authenticity of generated images.

Although there are some relatively direct methods to calculate image similarity, such as measuring the difference in pixels (such as FSIM, SSIM), this method obtains The difference in similarity is far from the difference perceived by the human eye.

After the rise of deep learning, some researchers found that the intermediate representation obtained by some neural network classifiers, such as AlexNet, VGG, SqueezeNet, etc. after training on ImageNet can Used as a computation of perceptual similarity.

In other words, embedding is closer to people’s perception of the similarity of multiple images than pixels.

Subvert three concepts! Googles latest research: Is it more accurate to calculate similarity with a poor-performance model?

Of course, this is just a hypothesis.

Recently Google published a paper specifically studying whether the ImageNet classifier can better evaluate perceptual similarity.

Subvert three concepts! Googles latest research: Is it more accurate to calculate similarity with a poor-performance model?

Paper link: ​https://openreview.net/pdf?id=qrGKGZZvH0​

Although there has been work on the BAPPS data set released in 2018, perceptual scores were studied on the first generation ImageNet classifier , In order to further evaluate the correlation between accuracy and perceptual score, as well as the impact of various hyperparameters, the research results of the latest ViT model are added to the paper.

The higher the accuracy, the worse the perceived similarity?

As we all know, the features learned through training on ImageNet can be well transferred to many downstream tasks and improve the performance of downstream tasks, which also makes pre-training on ImageNet a standard operation.

Additionally, achieving higher accuracy on ImageNet often means better performance on a diverse set of downstream tasks, such as robustness to damaged images, Generalization performance to out-of-distribution data and transfer learning to smaller categorical data sets.

But in terms of perceptual similarity calculation, everything seems to be reversed.

Models that achieve high accuracy on ImageNet have worse perceptual scores, while those with "mid-range" scores perform best on the perceptual similarity task.

Subvert three concepts! Googles latest research: Is it more accurate to calculate similarity with a poor-performance model?

ImageNet 64 × 64 validation accuracy (x-axis), Perceptual score on 64 × 64 BAPPS dataset (y-axis), Each blue dot represents an ImageNet classifier

It can be seen that the better ImageNet classifier achieves a better perceptual score to a certain extent, but beyond a certain Threshold, increasing the accuracy will reduce the perceptual score. The accuracy of the classifier is moderate (20.0-40.0), and the best perceptual score can be obtained. The article also studies the impact of neural network hyperparameters on perceptual scores, such as width, depth, number of training steps, weight attenuation, label smoothing and dropout

For each hyperparameter, there is an optimal accuracy, and increasing the accuracy can improve the perceptual score, but this optimal value is quite low and is reached very early in the hyperparameter sweep.

In addition to this, improvements in classifier accuracy lead to worse perceptual scores.

As an example, the article gives the changes in perceptual scores relative to two hyperparameters: training steps in ResNets and width in ViTs.

Early-stopped ResNets achieved the best perceptual scores at different depth settings of 6, 50 and 200

ResNet-50 and ResNet The perceptual score of -200 reaches the highest value in the first few epochs of training, but after the peak, the perceptual score value of the better performing classifier drops more sharply.

The results show that the training and learning rate adjustment of ResNets can improve the accuracy of the model as the step increases. Likewise, after the peak, the model also exhibits a progressive decrease in perceptual similarity scores that matches this progressively increasing accuracy.

ViTs consists of a set of Transformer blocks applied to the input image. The width of the ViT model is the number of output neurons of a single Transformer block. Increasing the width can effectively improve the accuracy of the model.

The researchers obtained two models B/8 (i.e. Base-ViT model, patch size is 4) and L/4 (i.e. Large -ViT model) and evaluate accuracy and perceptual scores.

The results are again similar to those observed for early-stopping ResNets, with narrower ViTs with lower accuracy performing better than the default width.

Subvert three concepts! Googles latest research: Is it more accurate to calculate similarity with a poor-performance model?

However, the optimal widths of ViT-B/8 and ViT-L/4 are 6% and 12% of their default widths respectively, paper A more detailed list of experiments on other hyperparameters such as width, depth, number of training steps, weight decay, label smoothing and dropout across ResNet and ViTs is also provided.

So if you want to improve the perceived similarity, the strategy is simple, just reduce the accuracy appropriately.

Subvert three concepts! Googles latest research: Is it more accurate to calculate similarity with a poor-performance model?

Improving the perceptual score by scaling down the ImageNet model, the values ​​in the table represent the values ​​given by scaling on the model with default hyperparameters Improvements obtained from models with fixed hyperparameters

Based on the above conclusion, the paper proposes a simple strategy to improve the perceptual score of the architecture: shrink the model to reduce accuracy, until Achieve optimal perception score.

Also visible in the experimental results is the perceptual score improvement obtained by scaling down each model on each hyperparameter. Early stopping yields the highest score improvement across all architectures except ViT-L/4, and early stopping is the most effective strategy without the need for time-consuming grid searches.

Global perceptual function

In previous work, the perceptual similarity function was calculated using the Euclidean distance across the image space dimensions.

This approach assumes a direct correspondence between pixels, but this correspondence may not apply to curved, translated, or rotated images.

In this article, the researchers adopted two perceptual functions that rely on the global representation of the image, namely neural style transfer that captures the style similarity between two images. style loss function and normalized average pooling distance function.

The style loss function compares the inter-channel cross-correlation matrix between two images, while the average pooling function compares the spatially averaged global representation.

Subvert three concepts! Googles latest research: Is it more accurate to calculate similarity with a poor-performance model?

The global perceptual function consistently improves the perceptual score for both network training with default hyperparameters and ResNet-200 as a function of training epochs

We also explore some hypotheses to explain the relationship between accuracy and perceptual ratings and derive some additional insights.

For example, model accuracy without the commonly used skip connection is also inversely proportional to the perceptual score, with layers closer to the output having on average lower perceptual scores compared to layers closer to the input .

We also further explored distortion sensitivity, ImageNet category granularity and spatial frequency sensitivity.

In short, this paper explores the issue of whether improving classification accuracy will produce better perceptual metrics. It studies the relationship between accuracy and perceptual scores on ResNets and ViTs under different hyperparameters, and finds that perceptual scores are related to Accuracy shows an inverted U-shaped relationship, in which accuracy and perception scores are related to a certain extent, showing an inverted U-shaped relationship.

Finally, the article discusses the relationship between accuracy and perceptual score in detail, including skip connection, global similarity function, distortion sensitivity, hierarchical perceptual score, spatial frequency sensitivity and ImageNet Category granularity.

While the exact explanation for the trade-off between ImageNet accuracy and perceptual similarity remains a mystery, this paper is a first step forward.

The above is the detailed content of Subvert three concepts! Google's latest research: Is it more accurate to calculate 'similarity' with a poor-performance model?. For more information, please follow other related articles on the PHP Chinese website!

Statement
This article is reproduced at:51CTO.COM. If there is any infringement, please contact admin@php.cn delete
How to Build Your Personal AI Assistant with Huggingface SmolLMHow to Build Your Personal AI Assistant with Huggingface SmolLMApr 18, 2025 am 11:52 AM

Harness the Power of On-Device AI: Building a Personal Chatbot CLI In the recent past, the concept of a personal AI assistant seemed like science fiction. Imagine Alex, a tech enthusiast, dreaming of a smart, local AI companion—one that doesn't rely

AI For Mental Health Gets Attentively Analyzed Via Exciting New Initiative At Stanford UniversityAI For Mental Health Gets Attentively Analyzed Via Exciting New Initiative At Stanford UniversityApr 18, 2025 am 11:49 AM

Their inaugural launch of AI4MH took place on April 15, 2025, and luminary Dr. Tom Insel, M.D., famed psychiatrist and neuroscientist, served as the kick-off speaker. Dr. Insel is renowned for his outstanding work in mental health research and techno

The 2025 WNBA Draft Class Enters A League Growing And Fighting Online HarassmentThe 2025 WNBA Draft Class Enters A League Growing And Fighting Online HarassmentApr 18, 2025 am 11:44 AM

"We want to ensure that the WNBA remains a space where everyone, players, fans and corporate partners, feel safe, valued and empowered," Engelbert stated, addressing what has become one of women's sports' most damaging challenges. The anno

Comprehensive Guide to Python Built-in Data Structures - Analytics VidhyaComprehensive Guide to Python Built-in Data Structures - Analytics VidhyaApr 18, 2025 am 11:43 AM

Introduction Python excels as a programming language, particularly in data science and generative AI. Efficient data manipulation (storage, management, and access) is crucial when dealing with large datasets. We've previously covered numbers and st

First Impressions From OpenAI's New Models Compared To AlternativesFirst Impressions From OpenAI's New Models Compared To AlternativesApr 18, 2025 am 11:41 AM

Before diving in, an important caveat: AI performance is non-deterministic and highly use-case specific. In simpler terms, Your Mileage May Vary. Don't take this (or any other) article as the final word—instead, test these models on your own scenario

AI Portfolio | How to Build a Portfolio for an AI Career?AI Portfolio | How to Build a Portfolio for an AI Career?Apr 18, 2025 am 11:40 AM

Building a Standout AI/ML Portfolio: A Guide for Beginners and Professionals Creating a compelling portfolio is crucial for securing roles in artificial intelligence (AI) and machine learning (ML). This guide provides advice for building a portfolio

What Agentic AI Could Mean For Security OperationsWhat Agentic AI Could Mean For Security OperationsApr 18, 2025 am 11:36 AM

The result? Burnout, inefficiency, and a widening gap between detection and action. None of this should come as a shock to anyone who works in cybersecurity. The promise of agentic AI has emerged as a potential turning point, though. This new class

Google Versus OpenAI: The AI Fight For StudentsGoogle Versus OpenAI: The AI Fight For StudentsApr 18, 2025 am 11:31 AM

Immediate Impact versus Long-Term Partnership? Two weeks ago OpenAI stepped forward with a powerful short-term offer, granting U.S. and Canadian college students free access to ChatGPT Plus through the end of May 2025. This tool includes GPT‑4o, an a

See all articles

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

AI Hentai Generator

AI Hentai Generator

Generate AI Hentai for free.

Hot Tools

ZendStudio 13.5.1 Mac

ZendStudio 13.5.1 Mac

Powerful PHP integrated development environment

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

mPDF

mPDF

mPDF is a PHP library that can generate PDF files from UTF-8 encoded HTML. The original author, Ian Back, wrote mPDF to output PDF files "on the fly" from his website and handle different languages. It is slower than original scripts like HTML2FPDF and produces larger files when using Unicode fonts, but supports CSS styles etc. and has a lot of enhancements. Supports almost all languages, including RTL (Arabic and Hebrew) and CJK (Chinese, Japanese and Korean). Supports nested block-level elements (such as P, DIV),

EditPlus Chinese cracked version

EditPlus Chinese cracked version

Small size, syntax highlighting, does not support code prompt function

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools