


Out-of-distribution (OOD) detection is crucial for the reliable operation of open-world intelligent systems, but current object-oriented detection methods suffer from "evaluation inconsistencies" (evaluation inconsistencies).
Previous Work OpenOOD v1 unifies the evaluation of OOD detection, but there are still limitations in scalability and usability.
The development team recently proposed OpenOOD v1.5 again. Compared with the previous version, the new OOD detection method evaluation has been significantly improved in ensuring accuracy, standardization and user-friendliness.
Picture
Paper: https://arxiv.org/abs/2306.09301
OpenOOD Codebase: https://github.com/Jingkang50/OpenOOD
OpenOOD Leaderboard: https://zjysteven.github.io/OpenOOD/
It is worth noting that OpenOOD v1.5 extends its evaluation capabilities to large-scale datasets such as ImageNet, investigates the important but untapped full-spectrum OOD detection, and introduces new features including online leaderboards and easy-to-use evaluators.
This work also contributes to in-depth analysis and insights from comprehensive experimental results, thereby enriching the knowledge base of OOD detection methods.
With these enhancements, OpenOOD v1.5 aims to drive the progress of OOD research and provide a more powerful and comprehensive evaluation benchmark for OOD detection research.
Research background
For a trained image classifier, a key capability that allows it to work reliably in the open world is Detect unknown, out-of-distribution (OOD) samples.
For example, we used a set of cat and dog photos to train a cat and dog classifier. For in-distribution (ID) samples, that is, cat and dog pictures here, we naturally expect the classifier to accurately identify them into the corresponding categories.
For OOD samples outside the distribution, that is, any pictures other than cats and dogs (such as airplanes, fruits, etc.), we hope that the model can detect that they are unknown, Novel objects/concepts, so they cannot be assigned to any category of cats or dogs within the distribution.
This problem is out-of-distribution detection (OOD detection), which has attracted widespread attention in recent years, and new work is emerging one after another. However, while the field is expanding rapidly, it has become difficult to track and measure the development status of the field due to various reasons.
Cause 1: Inconsistent test OOD data set.
The rapid development of various deep learning tasks is inseparable from a unified test data set (just like CIFAR, ImageNet for image classification, PASCAL VOC, COCO for object detection).
Unfortunately, however, the field of OOD detection has always lacked a unified and widely adopted OOD data set. This leads to the fact that in the figure above, when we look back at the experimental settings of existing work, we will find that the OOD data used is very inconsistent (for example, for CIFAR-10, which is ID data, some work uses MNIST and SVHN as OOD , some works use CIFAR-100, Tiny ImageNet as OOD). Under such circumstances, direct and fair comparisons of all methods face significant difficulties.
Reason 2: Confusing terminology.
In addition to OOD detection, other terms such as "Open-Set Recognition (OSR)" (Open-Set Recognition, OSR) and "Novelty Detection" also often appear in the literature .
They essentially focus on the same problem, with only minor differences in the details of some experimental settings. However, different terminology can lead to unnecessary branches between methods. For example, OOD detection and OSR were once regarded as two independent tasks, and there were very few methods between different branches (although they were solving the same problem). are compared together.
Cause 3: Wrong operation.
In many works, researchers often directly use samples in the OOD test set to adjust parameters or even train models. Such an operation would overestimate the method's OOD detection capability.
The above problems are obviously detrimental to the orderly development of the field. We urgently need a unified benchmark and platform to test and evaluate existing and future OOD detection methods.
OpenOOD came into being under such challenges. Its first version has taken an important step, but it has problems of small scale and usability that need to be improved.
Therefore, in the new version of OpenOOD v1.5, we have further strengthened and upgraded it, trying to create a comprehensive, accurate, and easy-to-use testing platform for the majority of researchers.
In summary, OpenOOD has the following important features and contributions:
1. Huge, modular code base.
This code base understands and modularizes model structure, data preprocessing, post-processing, training, testing, etc. to facilitate reuse and development. Currently, OpenOOD implements nearly 40 state-of-the-art OOD detection methods for image classification tasks.
Picture
2. An evaluator that can be tested with one click.
As shown in the figure above, with just a few lines of code, OpenOOD's evaluator can give the OOD detection test of the provided classifier and post-processor on the specified ID data set. result.
The corresponding OOD data is determined and provided internally by the evaluator, which ensures the consistency and fairness of the test. The evaluator also supports both standard OOD detection (standard OOD detection) and full-spectrum OOD detection (full-spectrum OOD detection) scenarios (more on this later).
3. Online rankings.
Using OpenOOD, we compared the performance of nearly 40 OOD detection methods on four ID data sets: CIFAR-10, CIFAR-100, ImageNet-200, and ImageNet-1K, and The results were made into a public ranking list. We hope to help everyone understand the most effective and promising methods in the field at any time.
4. New findings from the experimental results.
Based on the comprehensive experimental results of OpenOOD, we provide many new findings in the paper. For example, although it seems to have little to do with OOD detection, data augmentation can actually effectively improve the performance of OOD detection, and this improvement is orthogonal and complementary to the improvement brought by specific OOD detection methods.
In addition, we found that the performance of existing methods in full-spectrum OOD detection is not satisfactory, which will also be an important problem to be solved in the future field.
Problem Description
This section will briefly and popularly describe the goals of standard and full-spectrum OOD detection. For a more detailed and formal description, you are welcome to read our paper.
Picture
First some background. In the image classification scenario we consider, the in-distribution (ID) data is defined by the corresponding classification task. For example, for the CIFAR-10 classification, the ID distribution corresponds to its 10 semantic categories.
The concept of OOD is formed relative to ID: pictures corresponding to any semantic category other than the ID semantic category and different from the ID category are out-of-distribution OOD images. At the same time, we need to discuss the following two types of distributional shifts.
Semantic Shift: Distribution changes at the deep semantic level, corresponding to the horizontal axis of the above figure. For example, the semantic categories are cats and dogs during training, and the semantic categories are airplanes and fruits during testing.
Covariate Shift: The distribution changes at the surface statistical level (while the semantics remain unchanged), corresponding to the vertical axis of the above figure. For example, during training, there are clean and natural photos of cats and dogs, while during testing, there are noise-added or hand-drawn images of cats and dogs.
With the above background, combined with the above picture, you can better understand the standard and full-spectrum OOD detection.
Standard OOD detection
Objective (1): Train a classifier on the ID distribution so that it can accurately classify ID data . It is assumed here that there is no covariate shift between the test ID data and the training ID data.
Goal (2): Based on the trained classifier, design an OOD detection method so that it can distinguish ID/OOD from any sample. The corresponding thing in the above figure is to distinguish (a) from (c) (d).
Full spectrum OOD detection
Objective (1): Similar to standard OOD detection, but the difference is that covariate shift is considered, that is, regardless of To test whether there is a covariate shift in the ID image compared to the training image, the classifier needs to be accurately classified into the corresponding ID category (for example, the cat and dog classifier should not only accurately classify "clean" cat and dog images, but also be able to generalize to noisy, on blurry cat and dog pictures).
Goal (2): Also consider covariate-shifted ID samples, which need to be distinguished from OOD samples together with normal (no covariate shift) ID samples. Correspond to the distinction between (a) (b) and (c) (d) in the above figure.
Why is full spectrum OOD testing important?
Familiar friends may have discovered that target (1) in full-spectrum OOD detection actually corresponds to another very important research topic-out-of-distribution generalization (OOD generalization) ).
It needs to be clarified that OOD in OOD generalization refers to samples with covariate shift, while OOD in OOD detection refers to samples with semantic shift.
These two kinds of shifts are very common in the real world. However, the existing OOD generalization and standard OOD detection only consider one of them and ignore it. Another kind.
In contrast, full-spectrum OOD detection naturally considers both offsets together in the same scenario, more accurately reflecting our view of an ideal classifier in the open world. performance expectations.
Experimental results and new findings
In version 1.5, OpenOOD has tested nearly 40 methods on 6 benchmark data sets ( 4 for standard OOD detection and 2 for full-spectrum OOD detection) have been tested uniformly and comprehensively.
The methods and data sets implemented are described in the paper, and everyone is welcome to check it out. All experiments can also be reproduced in the OpenOOD code base. Here we discuss directly the findings derived from the comparison results.
Picture
Discovery 1: There is no single winner.
In the above table, it is not difficult to find that no method can consistently give outstanding performance on all benchmark data sets.
For example, post-hoc inference methods ReAct and ASH perform well on the large data set ImageNet, but have no advantage over other methods on CIFAR.
On the contrary, some training methods that add constraints in training, such as RotPred and LogitNorm, are better than post-processing methods on small data sets, but on ImageNet Not outstanding.
Finding 2: Data augmentations help.
As shown in the table above, although data enhancements are not specifically designed for OOD detection, they can effectively improve the performance of OOD detection. What is even more surprising is that the improvements brought by data augmentation and the improvements brought by specific OOD post-processing methods amplify each other.
Take AugMix as an example here. When it is combined with the simplest MSP post-processor, it reaches 77.49% in ImageNet-1K near-OOD detection rate, which is only lower than the cross-entropy loss without data enhancement (corss- entropy loss) training, the detection rate is 77.38% higher than 1.47%.
However, when AugMix is combined with the more advanced ASH post-processor, the corresponding detection rate is 3.99% higher than the cross-entropy baseline and reaches the highest in our tests of 82.16%. Such results show that the combination of data enhancement and post-processing has great potential to further improve OOD detection capabilities in the future.
Finding 3: Full-spectrum detection poses challenge for current detectors.
It can be clearly seen from the above figure that when the scene switches from standard OOD detection to full-spectrum OOD detection (that is, covariate-shifted ID images are added to the test ID data ), the performance of most methods shows significant degradation (greater than 10% decrease in detection rate).
This means that the current method tends to mark covariate-shifted ID images whose actual semantics have not changed as OOD.
This behavior is contrary to human perception (and also the target of full-spectrum OOD detection): Suppose a human tagger is tagging cat and dog pictures, and at this time show him/her For a noisy, blurry picture of a cat or dog, he/she should still recognize that it is a cat/dog, and that it is in-distribution ID data rather than unknown out-of-distribution OOD data.
Generally speaking, current methods cannot effectively solve full-spectrum OOD detection, and we believe this will be an important issue in the future field.
In addition, there are many findings that are not listed here, such as data enhancement is still effective for full-spectrum OOD detection, etc. Once again, everyone is welcome to read our paper.
Looking forward
We hope that OpenOOD’s code base, testers, rankings, benchmark data sets and detailed test results can bring together various Researchers work together to advance the field. I look forward to everyone using OpenOOD to develop and test OOD detection.
We also welcome any form of contribution to OpenOOD, including but not limited to providing feedback, adding the latest methods to the OpenOOD code base and leaderboards, extending future versions of OpenOOD, etc. .
Reference: https://arxiv.org/abs/2306.09301
The above is the detailed content of OpenOOD update v1.5: Comprehensive and accurate out-of-distribution detection code library and testing platform, supporting online rankings and one-click testing. For more information, please follow other related articles on the PHP Chinese website!

如何优化Java开发中的文件压缩解压性能随着互联网技术的不断发展,文件传输和存储成为我们日常开发中经常遇到的需求。为了减小网络传输的带宽消耗和文件存储的空间占用,我们通常需要对文件进行压缩。在Java开发中,常用的文件压缩格式有ZIP和GZIP。本文将介绍如何优化Java开发中的文件压缩解压性能,帮助提高效率。一、合理选择压缩算法在Java开发中,进行文件压

电脑性能看如下几个方面:1、电脑安装的操作系统的版本;2、电脑所配置的处理器类型;3、电脑安装的内存大小;4、操作系统是32位的还是64位的。

在Java开发中,字符串查找是一个常见且关键的操作。无论是在文本处理、数据分析还是系统日志分析等应用场景中,字符串的查找性能都对程序的整体性能有着重要影响。因此,如何优化字符串查找性能成为了Java开发中不可忽视的问题。一、使用indexOf()方法代替contains()方法在字符串查找中,Java提供了两个常用的方法:indexOf()和contains

如何优化Java开发中的随机数生成性能随机数在计算机科学中有广泛的应用,特别是在密码学、模拟、游戏等领域。在Java开发中,我们常常需要生成随机数来满足各种需求。然而,随机数生成的性能通常是开发者关注的问题之一。本文将探讨如何优化Java开发中的随机数生成性能。使用ThreadLocalRandom类在Java7中引入了ThreadLocalRandom类

Vue3是一款流行的JavaScript框架,它具有易学易用、高效稳定的特点,尤其擅长构建单页应用程序(SPA)。Vue3中的lazy函数,作为懒加载组件的利器之一,可以很大程度上提高应用程序的性能。本文将详解Vue3中的lazy函数的使用方法与原理,以及它在实际开发中的应用场景和优点。什么是懒加载?在传统的前后端分离的开发中,前端开发人员往往需要处理大量的

MySQL是一种常用的关系型数据库管理系统(RDBMS),在各种应用场景下都得到广泛的应用。然而,在高并发、大数据量的情况下,MySQL数据库的性能受到挑战,特别是在读写操作频繁的场景下,容易出现性能瓶颈。为了提高MySQL数据库的性能,可以通过设置MySQL缓存来减少数据库的IO操作,从而提高MySQL的查询效率。在本文中,我们将介绍如何通过设置MySQL

随着深度强化学习技术的快速发展,越来越多的研究团队开始将其应用于自动驾驶决策规划中,将行为决策与运动规划模块相融合,直接学习得到行驶轨迹。 自动驾驶中的决策规划模块是衡量和评价自动驾驶能力最核心的指标之一,它的主要任务是在接收到传感器的各种感知信息之后,对当前环境作出分析,然后对底层控制模块下达指令。典型的决策规划模块可以分为三个层次:全局路径规划、行为决策、运动规划。01 引言在一套完整的自动驾驶系统中,如果将感知模块比作人的眼睛和耳朵,那么决策规划就是自动驾驶的大脑。大脑在接收到传感器的各种

昨天一个跑了220个小时的微调训练完成了,主要任务是想在CHATGLM-6B上微调出一个能够较为精确的诊断数据库错误信息的对话模型来。不过这个等了将近十天的训练最后的结果令人失望,比起我之前做的一个样本覆盖更小的训练来,差的还是挺大的。这样的结果还是有点令人失望的,这个模型基本上是没有实用价值的。看样子需要重新调整参数与训练集,再做一次训练。大语言模型的训练是一场军备竞赛,没有好的装备是玩不起来的。看样子我们也必须要升级一下实验室的装备了,否则没有几个十天可以浪费。从最近的几次失败的微调训练来看


Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

AI Hentai Generator
Generate AI Hentai for free.

Hot Article

Hot Tools

Safe Exam Browser
Safe Exam Browser is a secure browser environment for taking online exams securely. This software turns any computer into a secure workstation. It controls access to any utility and prevents students from using unauthorized resources.

SublimeText3 Linux new version
SublimeText3 Linux latest version

SublimeText3 Chinese version
Chinese version, very easy to use

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Mac version
God-level code editing software (SublimeText3)
