OpenOOD update v1.5: Comprehensive and accurate out-of-distribution detection code library and testing platform, supporting online rankings and one-click testing-AI-php.cn

OpenOOD update v1.5: Comprehensive and accurate out-of-distribution detection code library and testing platform, supporting online rankings and one-click testing

PHPz

Jul 03, 2023 pm 04:41 PM

performancequantityopenood

Out-of-distribution (OOD) detection is crucial for the reliable operation of open-world intelligent systems, but current object-oriented detection methods suffer from "evaluation inconsistencies" (evaluation inconsistencies).

Previous Work OpenOOD v1 unifies the evaluation of OOD detection, but there are still limitations in scalability and usability.

The development team recently proposed OpenOOD v1.5 again. Compared with the previous version, the new OOD detection method evaluation has been significantly improved in ensuring accuracy, standardization and user-friendliness.

OpenOOD update v1.5: Comprehensive and accurate out-of-distribution detection code library and testing platform, supporting online rankings and one-click testing Picture

Paper: https://arxiv.org/abs/2306.09301

OpenOOD Codebase: https://github.com/Jingkang50/OpenOOD

OpenOOD Leaderboard: https://zjysteven.github.io/OpenOOD/

It is worth noting that OpenOOD v1.5 extends its evaluation capabilities to large-scale datasets such as ImageNet, investigates the important but untapped full-spectrum OOD detection, and introduces new features including online leaderboards and easy-to-use evaluators.

This work also contributes to in-depth analysis and insights from comprehensive experimental results, thereby enriching the knowledge base of OOD detection methods.

With these enhancements, OpenOOD v1.5 aims to drive the progress of OOD research and provide a more powerful and comprehensive evaluation benchmark for OOD detection research.

Research background

For a trained image classifier, a key capability that allows it to work reliably in the open world is Detect unknown, out-of-distribution (OOD) samples.

For example, we used a set of cat and dog photos to train a cat and dog classifier. For in-distribution (ID) samples, that is, cat and dog pictures here, we naturally expect the classifier to accurately identify them into the corresponding categories.

For OOD samples outside the distribution, that is, any pictures other than cats and dogs (such as airplanes, fruits, etc.), we hope that the model can detect that they are unknown, Novel objects/concepts, so they cannot be assigned to any category of cats or dogs within the distribution.

This problem is out-of-distribution detection (OOD detection), which has attracted widespread attention in recent years, and new work is emerging one after another. However, while the field is expanding rapidly, it has become difficult to track and measure the development status of the field due to various reasons.

Cause 1: Inconsistent test OOD data set.

The rapid development of various deep learning tasks is inseparable from a unified test data set (just like CIFAR, ImageNet for image classification, PASCAL VOC, COCO for object detection).

Unfortunately, however, the field of OOD detection has always lacked a unified and widely adopted OOD data set. This leads to the fact that in the figure above, when we look back at the experimental settings of existing work, we will find that the OOD data used is very inconsistent (for example, for CIFAR-10, which is ID data, some work uses MNIST and SVHN as OOD , some works use CIFAR-100, Tiny ImageNet as OOD). Under such circumstances, direct and fair comparisons of all methods face significant difficulties.

Reason 2: Confusing terminology.

In addition to OOD detection, other terms such as "Open-Set Recognition (OSR)" (Open-Set Recognition, OSR) and "Novelty Detection" also often appear in the literature .

They essentially focus on the same problem, with only minor differences in the details of some experimental settings. However, different terminology can lead to unnecessary branches between methods. For example, OOD detection and OSR were once regarded as two independent tasks, and there were very few methods between different branches (although they were solving the same problem). are compared together.

Cause 3: Wrong operation.

In many works, researchers often directly use samples in the OOD test set to adjust parameters or even train models. Such an operation would overestimate the method's OOD detection capability.

The above problems are obviously detrimental to the orderly development of the field. We urgently need a unified benchmark and platform to test and evaluate existing and future OOD detection methods.

OpenOOD came into being under such challenges. Its first version has taken an important step, but it has problems of small scale and usability that need to be improved.

Therefore, in the new version of OpenOOD v1.5, we have further strengthened and upgraded it, trying to create a comprehensive, accurate, and easy-to-use testing platform for the majority of researchers.

In summary, OpenOOD has the following important features and contributions:

1. Huge, modular code base.

This code base understands and modularizes model structure, data preprocessing, post-processing, training, testing, etc. to facilitate reuse and development. Currently, OpenOOD implements nearly 40 state-of-the-art OOD detection methods for image classification tasks.

OpenOOD update v1.5: Comprehensive and accurate out-of-distribution detection code library and testing platform, supporting online rankings and one-click testing Picture

2. An evaluator that can be tested with one click.

As shown in the figure above, with just a few lines of code, OpenOOD's evaluator can give the OOD detection test of the provided classifier and post-processor on the specified ID data set. result.

The corresponding OOD data is determined and provided internally by the evaluator, which ensures the consistency and fairness of the test. The evaluator also supports both standard OOD detection (standard OOD detection) and full-spectrum OOD detection (full-spectrum OOD detection) scenarios (more on this later).

3. Online rankings.

Using OpenOOD, we compared the performance of nearly 40 OOD detection methods on four ID data sets: CIFAR-10, CIFAR-100, ImageNet-200, and ImageNet-1K, and The results were made into a public ranking list. We hope to help everyone understand the most effective and promising methods in the field at any time.

4. New findings from the experimental results.

Based on the comprehensive experimental results of OpenOOD, we provide many new findings in the paper. For example, although it seems to have little to do with OOD detection, data augmentation can actually effectively improve the performance of OOD detection, and this improvement is orthogonal and complementary to the improvement brought by specific OOD detection methods.

In addition, we found that the performance of existing methods in full-spectrum OOD detection is not satisfactory, which will also be an important problem to be solved in the future field.

Problem Description

This section will briefly and popularly describe the goals of standard and full-spectrum OOD detection. For a more detailed and formal description, you are welcome to read our paper.

OpenOOD update v1.5: Comprehensive and accurate out-of-distribution detection code library and testing platform, supporting online rankings and one-click testing Picture

First some background. In the image classification scenario we consider, the in-distribution (ID) data is defined by the corresponding classification task. For example, for the CIFAR-10 classification, the ID distribution corresponds to its 10 semantic categories.

The concept of OOD is formed relative to ID: pictures corresponding to any semantic category other than the ID semantic category and different from the ID category are out-of-distribution OOD images. At the same time, we need to discuss the following two types of distributional shifts.

Semantic Shift: Distribution changes at the deep semantic level, corresponding to the horizontal axis of the above figure. For example, the semantic categories are cats and dogs during training, and the semantic categories are airplanes and fruits during testing.

Covariate Shift: The distribution changes at the surface statistical level (while the semantics remain unchanged), corresponding to the vertical axis of the above figure. For example, during training, there are clean and natural photos of cats and dogs, while during testing, there are noise-added or hand-drawn images of cats and dogs.

With the above background, combined with the above picture, you can better understand the standard and full-spectrum OOD detection.

Standard OOD detection

Objective (1): Train a classifier on the ID distribution so that it can accurately classify ID data . It is assumed here that there is no covariate shift between the test ID data and the training ID data.

Goal (2): Based on the trained classifier, design an OOD detection method so that it can distinguish ID/OOD from any sample. The corresponding thing in the above figure is to distinguish (a) from (c) (d).

Full spectrum OOD detection

Objective (1): Similar to standard OOD detection, but the difference is that covariate shift is considered, that is, regardless of To test whether there is a covariate shift in the ID image compared to the training image, the classifier needs to be accurately classified into the corresponding ID category (for example, the cat and dog classifier should not only accurately classify "clean" cat and dog images, but also be able to generalize to noisy, on blurry cat and dog pictures).

Goal (2): Also consider covariate-shifted ID samples, which need to be distinguished from OOD samples together with normal (no covariate shift) ID samples. Correspond to the distinction between (a) (b) and (c) (d) in the above figure.

Why is full spectrum OOD testing important?

Familiar friends may have discovered that target (1) in full-spectrum OOD detection actually corresponds to another very important research topic-out-of-distribution generalization (OOD generalization) ).

It needs to be clarified that OOD in OOD generalization refers to samples with covariate shift, while OOD in OOD detection refers to samples with semantic shift.

These two kinds of shifts are very common in the real world. However, the existing OOD generalization and standard OOD detection only consider one of them and ignore it. Another kind.

In contrast, full-spectrum OOD detection naturally considers both offsets together in the same scenario, more accurately reflecting our view of an ideal classifier in the open world. performance expectations.

Experimental results and new findings

In version 1.5, OpenOOD has tested nearly 40 methods on 6 benchmark data sets ( 4 for standard OOD detection and 2 for full-spectrum OOD detection) have been tested uniformly and comprehensively.

The methods and data sets implemented are described in the paper, and everyone is welcome to check it out. All experiments can also be reproduced in the OpenOOD code base. Here we discuss directly the findings derived from the comparison results.

OpenOOD update v1.5: Comprehensive and accurate out-of-distribution detection code library and testing platform, supporting online rankings and one-click testing Picture

Discovery 1: There is no single winner.

In the above table, it is not difficult to find that no method can consistently give outstanding performance on all benchmark data sets.

For example, post-hoc inference methods ReAct and ASH perform well on the large data set ImageNet, but have no advantage over other methods on CIFAR.

On the contrary, some training methods that add constraints in training, such as RotPred and LogitNorm, are better than post-processing methods on small data sets, but on ImageNet Not outstanding.

Finding 2: Data augmentations help.

As shown in the table above, although data enhancements are not specifically designed for OOD detection, they can effectively improve the performance of OOD detection. What is even more surprising is that the improvements brought by data augmentation and the improvements brought by specific OOD post-processing methods amplify each other.

Take AugMix as an example here. When it is combined with the simplest MSP post-processor, it reaches 77.49% in ImageNet-1K near-OOD detection rate, which is only lower than the cross-entropy loss without data enhancement (corss- entropy loss) training, the detection rate is 77.38% higher than 1.47%.

However, when AugMix is combined with the more advanced ASH post-processor, the corresponding detection rate is 3.99% higher than the cross-entropy baseline and reaches the highest in our tests of 82.16%. Such results show that the combination of data enhancement and post-processing has great potential to further improve OOD detection capabilities in the future.

Finding 3: Full-spectrum detection poses challenge for current detectors.

It can be clearly seen from the above figure that when the scene switches from standard OOD detection to full-spectrum OOD detection (that is, covariate-shifted ID images are added to the test ID data ), the performance of most methods shows significant degradation (greater than 10% decrease in detection rate).

This means that the current method tends to mark covariate-shifted ID images whose actual semantics have not changed as OOD.

This behavior is contrary to human perception (and also the target of full-spectrum OOD detection): Suppose a human tagger is tagging cat and dog pictures, and at this time show him/her For a noisy, blurry picture of a cat or dog, he/she should still recognize that it is a cat/dog, and that it is in-distribution ID data rather than unknown out-of-distribution OOD data.

Generally speaking, current methods cannot effectively solve full-spectrum OOD detection, and we believe this will be an important issue in the future field.

In addition, there are many findings that are not listed here, such as data enhancement is still effective for full-spectrum OOD detection, etc. Once again, everyone is welcome to read our paper.

Looking forward

We hope that OpenOOD’s code base, testers, rankings, benchmark data sets and detailed test results can bring together various Researchers work together to advance the field. I look forward to everyone using OpenOOD to develop and test OOD detection.

We also welcome any form of contribution to OpenOOD, including but not limited to providing feedback, adding the latest methods to the OpenOOD code base and leaderboards, extending future versions of OpenOOD, etc. .

Reference: https://arxiv.org/abs/2306.09301

The above is the detailed content of OpenOOD update v1.5: Comprehensive and accurate out-of-distribution detection code library and testing platform, supporting online rankings and one-click testing. For more information, please follow other related articles on the PHP Chinese website!

Statement

This article is reproduced at:51CTO.COM. If there is any infringement, please contact admin@php.cn delete

AI Game Development Enters Its Agentic Era With Upheaval's Dreamer PortalMay 02, 2025 am 11:17 AM

Upheaval Games: Revolutionizing Game Development with AI Agents Upheaval, a game development studio comprised of veterans from industry giants like Blizzard and Obsidian, is poised to revolutionize game creation with its innovative AI-powered platfor

Uber Wants To Be Your Robotaxi Shop, Will Providers Let Them?May 02, 2025 am 11:16 AM

Uber's RoboTaxi Strategy: A Ride-Hail Ecosystem for Autonomous Vehicles At the recent Curbivore conference, Uber's Richard Willder unveiled their strategy to become the ride-hail platform for robotaxi providers. Leveraging their dominant position in

AI Agents Playing Video Games Will Transform Future RobotsMay 02, 2025 am 11:15 AM

Video games are proving to be invaluable testing grounds for cutting-edge AI research, particularly in the development of autonomous agents and real-world robots, even potentially contributing to the quest for Artificial General Intelligence (AGI). A

The Startup Industrial Complex, VC 3.0, And James Currier's ManifestoMay 02, 2025 am 11:14 AM

The impact of the evolving venture capital landscape is evident in the media, financial reports, and everyday conversations. However, the specific consequences for investors, startups, and funds are often overlooked. Venture Capital 3.0: A Paradigm

Adobe Updates Creative Cloud And Firefly At Adobe MAX London 2025May 02, 2025 am 11:13 AM

Adobe MAX London 2025 delivered significant updates to Creative Cloud and Firefly, reflecting a strategic shift towards accessibility and generative AI. This analysis incorporates insights from pre-event briefings with Adobe leadership. (Note: Adob

Everything Meta Announced At LlamaConMay 02, 2025 am 11:12 AM

Meta's LlamaCon announcements showcase a comprehensive AI strategy designed to compete directly with closed AI systems like OpenAI's, while simultaneously creating new revenue streams for its open-source models. This multifaceted approach targets bo

The Brewing Controversy Over The Proposition That AI Is Nothing More Than Just Normal TechnologyMay 02, 2025 am 11:10 AM

There are serious differences in the field of artificial intelligence on this conclusion. Some insist that it is time to expose the "emperor's new clothes", while others strongly oppose the idea that artificial intelligence is just ordinary technology. Let's discuss it. An analysis of this innovative AI breakthrough is part of my ongoing Forbes column that covers the latest advancements in the field of AI, including identifying and explaining a variety of influential AI complexities (click here to view the link). Artificial intelligence as a common technology First, some basic knowledge is needed to lay the foundation for this important discussion. There is currently a large amount of research dedicated to further developing artificial intelligence. The overall goal is to achieve artificial general intelligence (AGI) and even possible artificial super intelligence (AS)

Model Citizens, Why AI Value Is The Next Business YardstickMay 02, 2025 am 11:09 AM

The effectiveness of a company's AI model is now a key performance indicator. Since the AI boom, generative AI has been used for everything from composing birthday invitations to writing software code. This has led to a proliferation of language mod

See all articles

Hot AI Tools

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress images for free

Clothoff.io

AI clothes remover

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

What's New in Windows 11 KB5054979 & How to Fix Update Issues

4 weeks agoByDDD

How to fix KB5055523 fails to install in Windows 11?

3 weeks agoByDDD

InZoi: How To Apply To School And University

1 months agoByDDD

How to fix KB5055518 fails to install in Windows 10?

3 weeks agoByDDD

Where to find the Site Office Key in Atomfall

1 months agoByDDD

Hot Tools

SublimeText3 Chinese version

Chinese version, very easy to use

SAP NetWeaver Server Adapter for Eclipse

Integrate Eclipse with SAP NetWeaver application server.

SublimeText3 English version

Recommended: Win version, supports code prompts!

mPDF

mPDF is a PHP library that can generate PDF files from UTF-8 encoded HTML. The original author, Ian Back, wrote mPDF to output PDF files "on the fly" from his website and handle different languages. It is slower than original scripts like HTML2FPDF and produces larger files when using Unicode fonts, but supports CSS styles etc. and has a lot of enhancements. Supports almost all languages, including RTL (Arabic and Hebrew) and CJK (Chinese, Japanese and Korean). Supports nested block-level elements (such as P, DIV),

SecLists

SecLists is the ultimate security tester's companion. It is a collection of various types of lists that are frequently used during security assessments, all in one place. SecLists helps make security testing more efficient and productive by conveniently providing all the lists a security tester might need. List types include usernames, passwords, URLs, fuzzing payloads, sensitive data patterns, web shells, and more. The tester can simply pull this repository onto a new test machine and he will have access to every type of list he needs.

Hot Topics

Where is the login entrance for gmail email?

7912

1652

1411

1303

1248