search
HomeTechnology peripheralsAIWorrying image quality interferes with visual recognition, DAMO Academy proposes a more robust framework

This article introduces the paper "Improving Training and Inference of Face Recognition Models via Random Temperature Scaling" accepted by AAAI 2023, the top international conference on machine learning. This paper innovatively analyzes the internal relationship between the temperature adjustment parameter and classification uncertainty in the classification loss function from a probabilistic perspective, revealing that the temperature adjustment factor of the classification loss function is the scale coefficient of the uncertainty variable obeying the Gumbel distribution. . Therefore, a new training framework called RTS is proposed to model the reliability of feature extraction. Based on the RTS training framework, a more reliable recognition model is trained, making the training process more stable, and providing a measurement score of sample uncertainty during deployment to reject high-uncertain samples and help build a more robust vision recognition system. Extensive experiments show that RTS can train stably and output uncertainty measures to build a robust visual recognition system.

Worrying image quality interferes with visual recognition, DAMO Academy proposes a more robust framework


  • ##Paper address: https://arxiv.org/abs/2212.01015
  • Open source model: https://modelscope.cn/models/damo/cv_ir_face-recognition-ood_rts/summary
Background

Uncertainty problem:Visual recognition systems usually encounter a variety of interferences in real scenes. For example: occlusion (decoration or complex foreground), imaging blur (focus blur or motion blur), extreme lighting (overexposure or underexposure, etc.). These interferences can be summarized as the influence of noise. In addition, there are misdetected pictures, usually cat faces or dog faces. These misdetected data are called out-of-distribution (OOD) data. For visual recognition, the above-mentioned noise and OOD data constitute a source of uncertainty. Affected samples will superimpose uncertainty on the features extracted based on the depth model, causing interference to the visual recognition system. For example, if the base library image is contaminated by samples with uncertain interference, a "feature black hole" will be formed, which will bring hidden dangers to the visual recognition system. There is therefore a need to model representation reliability.

Related work on characterization reliability modeling

Traditional multi-model solution

Traditional The method of controlling reliability in the visual recognition link is done through an independent quality model. The typical image quality modeling method is as follows:

1. Collect annotation data and annotate specific factors that affect quality, such as clarity, presence or absence of occlusion, and posture.

2. Map the quality score from 1 to 10 according to the label of the influencing factors. The higher the score, the better the quality. For specific examples, please refer to the example on the left side of the figure below.

3. After obtaining the quality score annotation from the first two steps, perform ordered regression training to predict the quality score during the deployment phase, as shown in the example on the right side of the figure below.

Worrying image quality interferes with visual recognition, DAMO Academy proposes a more robust framework

The independent quality model solution requires the introduction of a new model in the visual recognition link, and the training relies on annotation information .

DUL

The uncertainty modeling method includes "Data Uncertainty Learning in Face Recognition". The features are modeled as the sum of the mean and variance of the Gaussian distribution, and the features containing uncertainty are sent to the subsequent classifier for training. Thus, the uncertainty score related to image quality can be obtained during the deployment stage.

Worrying image quality interferes with visual recognition, DAMO Academy proposes a more robust framework

DUL uses a summation method to describe uncertainty. The scale of the noise estimate is also the same as that of a certain type of data. Feature distribution is closely related. If the data distribution is relatively tight, then the scale of the noise estimated by DUL is also relatively small. Work in the field of OOD points out that the density of data distribution is not a good metric for OOD identification.

GODIN

The work in the field of OOD "Generalized odin: Detecting out-of-distribution image without learning from out-of-distribution data" uses the form of joint probability distribution to process OOD data, using two independent branches h ( x) and g(x) estimate the classification probability value and the temperature adjustment value.

Worrying image quality interferes with visual recognition, DAMO Academy proposes a more robust framework

Since the temperature value is modeled as a probability value and the range is limited to 0-1, the temperature is not better modeled .

Method

In view of the above problems and related work, this paper starts from the probability perspective and studies the relationship between the temperature adjustment factor and uncertainty in the classification loss function. After analysis, the RTS training framework is proposed.

Worrying image quality interferes with visual recognition, DAMO Academy proposes a more robust framework

##Analysis of temperature regulation factors based on probability perspective

First analyze the relationship between the temperature adjustment factor and uncertainty. Assume that the uncertainty Worrying image quality interferes with visual recognition, DAMO Academy proposes a more robust framework is a random variable that conforms to the standard Gumbel distribution, then the probability density function can be written as

Worrying image quality interferes with visual recognition, DAMO Academy proposes a more robust framework ,The cumulative distribution function isWorrying image quality interferes with visual recognition, DAMO Academy proposes a more robust framework, and the probability value of classified into class k is:

Worrying image quality interferes with visual recognition, DAMO Academy proposes a more robust framework

Put Worrying image quality interferes with visual recognition, DAMO Academy proposes a more robust framework into the above formula to get:

Worrying image quality interferes with visual recognition, DAMO Academy proposes a more robust framework

##It can be seen that the probability value classified into k class is the score that conforms to the softmax function. At the same time, we can use a t to adjust the scale of uncertainty, that is, it conforms to the standard Gumbel distribution:

Worrying image quality interferes with visual recognition, DAMO Academy proposes a more robust framework

It can be seen that the probability value classified into class k at this time is consistent with the softmax function with a temperature adjustment value of t Score.

Modeling temperature

In order to reduce the impact of uncertainty estimation on classification, the temperature t needs to be near 1, so we model the temperature t as the sum of Worrying image quality interferes with visual recognition, DAMO Academy proposes a more robust framework independent gamma distribution variables:Worrying image quality interferes with visual recognition, DAMO Academy proposes a more robust frameworkwhereWorrying image quality interferes with visual recognition, DAMO Academy proposes a more robust framework, so that t obeys Worrying image quality interferes with visual recognition, DAMO Academy proposes a more robust framework

##, beta = frac {alpha - 1} {v})$ distribution. The influence of v and Worrying image quality interferes with visual recognition, DAMO Academy proposes a more robust framework on the distribution is as shown below.

Worrying image quality interferes with visual recognition, DAMO Academy proposes a more robust framework

The constraints on temperature modeling are implemented using the following regular terms during training

Worrying image quality interferes with visual recognition, DAMO Academy proposes a more robust framework

Training method

The overall algorithm is organized as:

Worrying image quality interferes with visual recognition, DAMO Academy proposes a more robust framework

For more detailed analysis and theoretical proof, please refer to the paper.

Result

In the training phase, the training data only contains face training data. The OOD data of falsely detected cat faces and dog faces is used to verify the recognition effect of OOD data during testing and the test illustrates the dynamic process of OOD sample uncertainty at different stages in the training process.

Training phase

We draw the in-distribution data (face) and out-of-distribution The uncertainty scores of the data (cat faces and dog faces mistakenly detected as faces) at different epoch numbers. From the figure below, you can see that the uncertainty scores of all samples in the initial stage are distributed near the larger values, and then As training progresses, the uncertainty of OOD samples gradually increases, and the uncertainty of face data gradually decreases. The better the face quality, the lower the uncertainty. ID data and OOD data can be distinguished by setting a threshold, and the image quality is reflected by the uncertainty score.

Worrying image quality interferes with visual recognition, DAMO Academy proposes a more robust framework

To illustrate the robustness to noisy training data during the training phase. This article applies different proportions of noise to the training set. The model recognition effects based on different proportions of noise training data are as follows. It can be seen that RTS can also achieve better recognition results for training based on noise data.

Worrying image quality interferes with visual recognition, DAMO Academy proposes a more robust framework

Deployment Phase

The picture below It shows that the uncertainty score obtained by the RTS framework during the deployment phase has a high correlation with the face quality

Worrying image quality interferes with visual recognition, DAMO Academy proposes a more robust framework

At the same time, the error matching curve after removing low-quality samples is plotted on the benchmark. Based on the obtained uncertainty scores, samples with higher uncertainty in the benchmark are removed in order of uncertainty from high to low, and then the error matching curves of the remaining samples are drawn. As can be seen from the figure below, as more samples with higher uncertainty are filtered, there are fewer false matches, and when the same number of uncertainty samples are removed, RTS has fewer false matches.

Worrying image quality interferes with visual recognition, DAMO Academy proposes a more robust framework

#In order to verify the identification effect of the uncertainty score on OOD samples, an in-distribution data set was constructed during testing (face) and out-of-distribution data sets (cat faces and dog faces mistakenly detected as faces). Data sample is as follows.

Worrying image quality interferes with visual recognition, DAMO Academy proposes a more robust framework

We explain the effect of RTS from two aspects. First, draw the distribution chart of uncertainty. As can be seen from the figure below, the RTS method has strong discrimination ability for OOD data.

Worrying image quality interferes with visual recognition, DAMO Academy proposes a more robust framework

At the same time, the ROC curve on the OOD test set was also drawn, and the AUC value of the ROC authority was calculated, as you can see The uncertainty score of RTS can better identify OOD data.

Worrying image quality interferes with visual recognition, DAMO Academy proposes a more robust framework

Worrying image quality interferes with visual recognition, DAMO Academy proposes a more robust framework

##General recognition ability

To test the general recognition ability on the benchmark, RTS increases the recognition ability of OOD data without affecting the face recognition ability. Using the RTS algorithm can achieve a balanced result in identification and OOD data identification.

Worrying image quality interferes with visual recognition, DAMO Academy proposes a more robust framework

Worrying image quality interferes with visual recognition, DAMO Academy proposes a more robust framework

#APPLY

This article The model is open sourced on modelscope. In addition, I would like to introduce to you the open source free models on the CV domain. Everyone is welcome to experience and download (you can experience it on most mobile phones):

1.https://modelscope.cn/ models/damo/cv_resnet50_face-detection_retinaface/summary

##2.https://modelscope.cn/models/damo/cv_resnet101_face-detection_cvpr22papermogface/summary

3.https://modelscope.cn/models/damo/cv_manual_face-detection_tinymog/summary

4.https://modelscope.cn/models/damo/cv_manual_face-detection_ulfd /summary

5.https://modelscope.cn/models/damo/cv_manual_face-detection_mtcnn/summary

6.https:/ /modelscope.cn/models/damo/cv_resnet_face-recognition_facemask/summary

7.https://modelscope.cn/models/damo/cv_ir50_face-recognition_arcface/summary

8. https://modelscope.cn/models/damo/cv_manual_face-liveness_flir/summary

9.https://modelscope.cn/models/ damo/cv_manual_face-liveness_flrgb/summary

10.https://modelscope.cn/models/damo/cv_manual_facial-landmark-confidence_flcm/summary

11.https://modelscope.cn/models/damo/cv_vgg19_facial-expression-recognition_fer/summary

12.https://modelscope.cn/models/damo/cv_resnet34_face -attribute-recognition_fairface/summary

The above is the detailed content of Worrying image quality interferes with visual recognition, DAMO Academy proposes a more robust framework. For more information, please follow other related articles on the PHP Chinese website!

Statement
This article is reproduced at:51CTO.COM. If there is any infringement, please contact admin@php.cn delete
再掀强化学习变革!DeepMind提出「算法蒸馏」:可探索的预训练强化学习Transformer再掀强化学习变革!DeepMind提出「算法蒸馏」:可探索的预训练强化学习TransformerApr 12, 2023 pm 06:58 PM

在当下的序列建模任务上,Transformer可谓是最强大的神经网络架构,并且经过预训练的Transformer模型可以将prompt作为条件或上下文学习(in-context learning)适应不同的下游任务。大型预训练Transformer模型的泛化能力已经在多个领域得到验证,如文本补全、语言理解、图像生成等等。从去年开始,已经有相关工作证明,通过将离线强化学习(offline RL)视为一个序列预测问题,那么模型就可以从离线数据中学习策略。但目前的方法要么是从不包含学习的数据中学习策略

大模型训练成本降低近一半!新加坡国立大学最新优化器已投入使用大模型训练成本降低近一半!新加坡国立大学最新优化器已投入使用Jul 17, 2023 pm 10:13 PM

优化器在大语言模型的训练中占据了大量内存资源。现在有一种新的优化方式,在性能保持不变的情况下将内存消耗降低了一半。该成果由新加坡国立大学打造,在ACL会议上获得了杰出论文奖,并已经投入了实际应用。图片随着大语言模型不断增加的参数量,训练时的内存消耗问题更为严峻。研究团队提出了CAME优化器,在减少内存消耗的同时,拥有与Adam相同的性能。图片CAME优化器在多个常用的大规模语言模型的预训练上取得了相同甚至超越Adam优化器的训练表现,并对大batch预训练场景显示出更强的鲁棒性。进一步地,通过C

无需下游训练,Tip-Adapter大幅提升CLIP图像分类准确率无需下游训练,Tip-Adapter大幅提升CLIP图像分类准确率Apr 12, 2023 pm 03:25 PM

论文链接:https://arxiv.org/pdf/2207.09519.pdf代码链接:https://github.com/gaopengcuhk/Tip-Adapter一.研究背景对比性图像语言预训练模型(CLIP)在近期展现出了强大的视觉领域迁移能力,可以在一个全新的下游数据集上进行 zero-shot 图像识别。为了进一步提升 CLIP 的迁移性能,现有方法使用了 few-shot 的设置,例如 CoOp 和 CLIP-Adapter,即提供了少量下游数据集的训练数据,使得 CLIP

单机训练200亿参数大模型:Cerebras打破新纪录单机训练200亿参数大模型:Cerebras打破新纪录Apr 18, 2023 pm 12:37 PM

本周,芯片创业公司Cerebras宣布了一个里程碑式的新进展:在单个计算设备中训练了超过百亿参数的NLP(自然语言处理)人工智能模型。由Cerebras训练的AI模型体量达到了前所未有的200亿参数,所有这些都无需横跨多个加速器扩展工作负载。这项工作足以满足目前网络上最火的文本到图像AI生成模型——OpenAI的120亿参数大模型DALL-E。Cerebras新工作中最重要的一点是对基础设施和软件复杂性的要求降低了。这家公司提供的芯片WaferScaleEngine-

用少于256KB内存实现边缘训练,开销不到PyTorch千分之一用少于256KB内存实现边缘训练,开销不到PyTorch千分之一Apr 08, 2023 pm 01:11 PM

说到神经网络训练,大家的第一印象都是 GPU + 服务器 + 云平台。传统的训练由于其巨大的内存开销,往往是云端进行训练而边缘平台仅负责推理。然而,这样的设计使得 AI 模型很难适应新的数据:毕竟现实世界是一个动态的,变化的,发展的场景,一次训练怎么能覆盖所有场景呢?为了使得模型能够不断的适应新数据,我们能否在边缘进行训练(on-device training),使设备不断的自我学习?在这项工作中,我们仅用了不到 256KB 内存就实现了设备上的训练,开销不到 PyTorch 的 1/1000,

图像质量堪忧干扰视觉识别,达摩院提出更鲁棒框架图像质量堪忧干扰视觉识别,达摩院提出更鲁棒框架Apr 14, 2023 pm 04:31 PM

本文介绍被机器学习顶级国际会议AAAI2023接收的论文《ImprovingTrainingandInferenceofFaceRecognitionModelsviaRandomTemperatureScaling》。论文创新性地从概率视角出发,对分类损失函数中的温度调节参数和分类不确定度的内在关系进行分析,揭示了分类损失函数的温度调节因子是服从Gumbel分布的不确定度变量的尺度系数。从而提出一个新的被叫做RTS的训练框架对特征抽取的可靠性进行建模。基于RTS

三维场景生成:无需任何神经网络训练,从单个样例生成多样结果三维场景生成:无需任何神经网络训练,从单个样例生成多样结果Jun 09, 2023 pm 08:22 PM

多样高质的三维场景生成结果论文地址:https://arxiv.org/abs/2304.12670项目主页:http://weiyuli.xyz/Sin3DGen/引言使用人工智能辅助内容生成(AIGC)在图像生成领域涌现出大量的工作,从早期的变分自编码器(VAE),到生成对抗网络(GAN),再到最近大红大紫的扩散模型(DiffusionModel),模型的生成能力飞速提升。以StableDiffusion,Midjourney等为代表的模型在生成具有高真实感图像方面取得了前所未有的成果。同时

AI绘画侵权实锤!扩散模型可能记住你的照片,现有隐私保护方法全部失效AI绘画侵权实锤!扩散模型可能记住你的照片,现有隐私保护方法全部失效Apr 12, 2023 pm 10:16 PM

本文经AI新媒体量子位(公众号ID:QbitAI)授权转载,转载请联系出处。AI绘画侵权,实锤了!最新研究表明,扩散模型会牢牢记住训练集中的样本,并在生成时“依葫芦画瓢”。也就是说,像Stable Diffusion生成的AI画作里,每一笔背后都可能隐藏着一次侵权事件。不仅如此,经过研究对比,扩散模型从训练样本中“抄袭”的能力是GAN的2倍,且生成效果越好的扩散模型,记住训练样本的能力越强。这项研究来自Google、DeepMind和UC伯克利组成的团队。论文中还有另一个糟糕的消息,那就是针对这

See all articles

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

AI Hentai Generator

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)
2 weeks agoBy尊渡假赌尊渡假赌尊渡假赌
Repo: How To Revive Teammates
1 months agoBy尊渡假赌尊渡假赌尊渡假赌
Hello Kitty Island Adventure: How To Get Giant Seeds
4 weeks agoBy尊渡假赌尊渡假赌尊渡假赌

Hot Tools

SAP NetWeaver Server Adapter for Eclipse

SAP NetWeaver Server Adapter for Eclipse

Integrate Eclipse with SAP NetWeaver application server.

MinGW - Minimalist GNU for Windows

MinGW - Minimalist GNU for Windows

This project is in the process of being migrated to osdn.net/projects/mingw, you can continue to follow us there. MinGW: A native Windows port of the GNU Compiler Collection (GCC), freely distributable import libraries and header files for building native Windows applications; includes extensions to the MSVC runtime to support C99 functionality. All MinGW software can run on 64-bit Windows platforms.

VSCode Windows 64-bit Download

VSCode Windows 64-bit Download

A free and powerful IDE editor launched by Microsoft

MantisBT

MantisBT

Mantis is an easy-to-deploy web-based defect tracking tool designed to aid in product defect tracking. It requires PHP, MySQL and a web server. Check out our demo and hosting services.

mPDF

mPDF

mPDF is a PHP library that can generate PDF files from UTF-8 encoded HTML. The original author, Ian Back, wrote mPDF to output PDF files "on the fly" from his website and handle different languages. It is slower than original scripts like HTML2FPDF and produces larger files when using Unicode fonts, but supports CSS styles etc. and has a lot of enhancements. Supports almost all languages, including RTL (Arabic and Hebrew) and CJK (Chinese, Japanese and Korean). Supports nested block-level elements (such as P, DIV),