Three papers solve the problem of 'Optimization and Evaluation of Semantic Segmentation'! Leuven/Tsinghua/Oxford and others jointly proposed a new method-AI-php.cn

Three papers solve the problem of 'Optimization and Evaluation of Semantic Segmentation'! Leuven/Tsinghua/Oxford and others jointly proposed a new method

王林

Feb 06, 2024 pm 09:15 PM

loss functionindexSemantic segmentation

Commonly used loss functions for optimizing semantic segmentation models include Soft Jaccard loss, Soft Dice loss and Soft Tversky loss. However, these loss functions are incompatible with soft labels and therefore cannot support some important training techniques such as label smoothing, knowledge distillation, semi-supervised learning, and multiple annotators. These training techniques are very important to improve the performance and robustness of semantic segmentation models, so further research and optimization of loss functions are needed to support the application of these training techniques.

On the other hand, commonly used semantic segmentation evaluation indicators include mAcc and mIoU. However, these indicators have a preference for larger objects, which seriously affects the safety performance evaluation of the model.

To solve these problems, researchers at the University of Leuven and Tsinghua first proposed the JDT loss. JDT loss is a fine-tuning of the original loss function, which includes Jaccard Metric loss, Dice Semimetric loss and Compatible Tversky loss. The JDT loss is equivalent to the original loss function when dealing with hard labels, and is also fully applicable to soft labels. This improvement makes model training more accurate and stable.

The researchers successfully applied the JDT loss in four important scenarios: label smoothing, knowledge distillation, semi-supervised learning, and multiple annotators. These applications demonstrate the power of the JDT loss to improve model accuracy and calibration.

Three papers solve the problem of Optimization and Evaluation of Semantic Segmentation! Leuven/Tsinghua/Oxford and others jointly proposed a new method Picture

Paper link: https://arxiv.org/pdf/2302.05666.pdf

Three papers solve the problem of Optimization and Evaluation of Semantic Segmentation! Leuven/Tsinghua/Oxford and others jointly proposed a new method Picture

Paper link: https://arxiv.org/pdf/2303.16296.pdf

In addition, researchers also proposed fine-grained evaluation indicators. These fine-grained evaluation metrics are less biased against large-sized objects, provide richer statistical information, and can provide valuable insights for model and dataset auditing.

Moreover, the researchers conducted an extensive benchmark study that emphasized the need for evaluation not to be based on a single metric and discovered the important role of neural network structure and JDT loss in optimizing fine-grained metrics.

Three papers solve the problem of Optimization and Evaluation of Semantic Segmentation! Leuven/Tsinghua/Oxford and others jointly proposed a new method Picture

Paper link: https://arxiv.org/pdf/2310.19252.pdf

Code link: https://github.com/zifuwanggg/JDTLosses

Existing loss function

Since Jaccard Index and Dice Score are defined on the set, So it's not directable. In order to make them differentiable, there are currently two common approaches: one is to use the relationship between the set and the Lp module of the corresponding vector, such as Soft Jaccard loss (SJL), Soft Dice loss (SDL) and Soft Tversky loss (STL). ).

They write the size of the set as the L1 module of the corresponding vector, and write the intersection of two sets as the inner product of the two corresponding vectors. The other is to use the submodular property of Jaccard Index to do Lovasz expansion on the set function, such as Lovasz-Softmax loss (LSL).

Three papers solve the problem of Optimization and Evaluation of Semantic Segmentation! Leuven/Tsinghua/Oxford and others jointly proposed a new method Picture

These loss functions assume that the output x of the neural network is a continuous vector, The label y is a discrete binary vector. If the label is a soft label, that is, when y is no longer a discrete binary vector, but a continuous vector, these loss functions are no longer compatible.

Taking SJL as an example, consider a simple single-pixel situation:

Three papers solve the problem of Optimization and Evaluation of Semantic Segmentation! Leuven/Tsinghua/Oxford and others jointly proposed a new method Picture

It can be found that for any y > 0, SJL will be minimized when x = 1 and maximized when x = 0. Since a loss function should be minimized when x = y, this is obviously unreasonable.

Loss function compatible with soft labels

In order to make the original loss function compatible with soft labels, it is necessary to calculate the intersection and union of two sets, Introduce the symmetric difference between the two sets:

Three papers solve the problem of Optimization and Evaluation of Semantic Segmentation! Leuven/Tsinghua/Oxford and others jointly proposed a new method Picture

Note that the symmetric difference between the two sets can Written as the L1 module of the difference between two corresponding vectors:

Three papers solve the problem of Optimization and Evaluation of Semantic Segmentation! Leuven/Tsinghua/Oxford and others jointly proposed a new method Picture

Putting the above together, we proposed the JDT loss. They are a variant of SJL, Jaccard Metric loss (JML), a variant of SDL, Dice Semimetric loss (DML), and a variant of STL, Compatible Tversky loss (CTL).

Three papers solve the problem of Optimization and Evaluation of Semantic Segmentation! Leuven/Tsinghua/Oxford and others jointly proposed a new method Picture

Nature of JDT loss

We proved that JDT loss has Some of the following properties.

Property 1: JML is a metric, and DML is a semimetric.

Property 2: When y is a hard label, JML is equivalent to SJL, DML is equivalent to SDL, and CTL is equivalent to STL.

Property 3: When y is a soft label, JML, DML, and CTL are all compatible with soft labels, that is, x = y ó f(x, y) = 0.

Due to Property 1, they are also called Jaccard Metric loss and Dice Semimetric loss. Property 2 shows that in general scenarios where only hard labels are used for training, JDT loss can be directly used to replace the existing loss function without causing any changes.

How to use JDT loss

We have conducted a lot of experiments and summarized some precautions for using JDT loss.

Note 1: Select the corresponding loss function based on the evaluation index. If the evaluation index is Jaccard Index, then JML should be selected; if the evaluation index is Dice Score, then DML should be selected; if you want to give different weights to false positives and false negatives, then CTL should be selected. Secondly, when optimizing fine-grained evaluation indicators, the JDT loss should also be changed accordingly.

Note 2: Combine JDT loss and pixel-level loss function (such as Cross Entropy loss, Focal loss). This article found that 0.25CE 0.75JDT is generally a good choice.

Note 3: It is best to use a shorter epoch for training. After adding the JDT loss, it generally only requires half the epochs of the Cross Entropy loss training.

Note 4: When performing distributed training on multiple GPUs, if there is no additional communication between GPUs, the JDT loss will incorrectly optimize fine-grained evaluation metrics, resulting in The effect becomes worse on traditional mIoU.

Note 5: When training on an extreme category imbalanced data set, it should be noted that the JDL loss is calculated separately on each category and then averaged, which may cause Training becomes erratic.

Experimental results

The experiment proves that compared with the baseline of Cross Entropy loss, adding JDT loss can effectively improve the accuracy of the model when training with hard labels. . The accuracy and calibration of the model can be further improved by introducing soft labels.

Three papers solve the problem of Optimization and Evaluation of Semantic Segmentation! Leuven/Tsinghua/Oxford and others jointly proposed a new method Picture

Only adding the JDT loss term during training, this article has achieved semantic segmentation Knowledge distillation, semi-supervised learning and multi-annotator SOTA.

Three papers solve the problem of Optimization and Evaluation of Semantic Segmentation! Leuven/Tsinghua/Oxford and others jointly proposed a new method Image] [image

Three papers solve the problem of Optimization and Evaluation of Semantic Segmentation! Leuven/Tsinghua/Oxford and others jointly proposed a new method Picture

Existing evaluation indicators Three papers solve the problem of Optimization and Evaluation of Semantic Segmentation! Leuven/Tsinghua/Oxford and others jointly proposed a new method Semantic segmentation is a pixel-level classification task, so The accuracy of each pixel can be calculated: overall pixel-wise accuracy (Acc). However, because Acc will be biased towards the majority category, PASCAL VOC 2007 adopts an evaluation index that calculates the pixel accuracy of each category separately and then averages it: mean pixel-wise accuracy (mAcc).

But since mAcc does not consider false positives, since PASCAL VOC 2008, the average intersection and union ratio (per-dataset mIoU, mIoUD) has been used as the evaluation index. PASCAL VOC was the first data set to introduce the semantic segmentation task, and the evaluation indicators it used were widely used in various subsequent data sets.

Specifically, IoU can be written as:

Three papers solve the problem of Optimization and Evaluation of Semantic Segmentation! Leuven/Tsinghua/Oxford and others jointly proposed a new method Picture

In order to calculate mIoUD, we first need to count for each category c the trueness of all I photos in the entire data set positive (TP), false positive (FP) and false negative (FN):

Three papers solve the problem of Optimization and Evaluation of Semantic Segmentation! Leuven/Tsinghua/Oxford and others jointly proposed a new method ##Picture

Having the values for each category, we average by category to eliminate preference for the majority category:

Three papers solve the problem of Optimization and Evaluation of Semantic Segmentation! Leuven/Tsinghua/Oxford and others jointly proposed a new method Picture

Because mIoUD sums together the TP, FP and FN of all pixels in the entire dataset, it will inevitably be biased towards those large-sized objects.

In some application scenarios with high safety requirements, such as autonomous driving and medical images, there are often objects that are small but cannot be ignored.

As shown in the picture below, the size of the cars in different photos is obviously different. Therefore, mIoUD's preference for large-sized objects will seriously affect its evaluation of model safety performance.

Three papers solve the problem of Optimization and Evaluation of Semantic Segmentation! Leuven/Tsinghua/Oxford and others jointly proposed a new method

##Fine-grained evaluation indicators

In order to solve the problem of mIoUD, we propose a fine-grained evaluation index. These metrics calculate IoU on each photo separately, which can effectively reduce the preference for large-sized objects.

mIoUI

For each category c, we calculate an IoU on each photo i:

Picture Three papers solve the problem of Optimization and Evaluation of Semantic Segmentation! Leuven/Tsinghua/Oxford and others jointly proposed a new method

Next, for each photo i, we average all categories that have appeared in this photo ：

Picture Three papers solve the problem of Optimization and Evaluation of Semantic Segmentation! Leuven/Tsinghua/Oxford and others jointly proposed a new method

Finally, we average the values of all the photos:

Picture Three papers solve the problem of Optimization and Evaluation of Semantic Segmentation! Leuven/Tsinghua/Oxford and others jointly proposed a new method

mIoUC

Similarly, after calculating After the IoU of each category c on each photo i, we can average all the photos in which each category c appears:

Three papers solve the problem of Optimization and Evaluation of Semantic Segmentation! Leuven/Tsinghua/Oxford and others jointly proposed a new method

Finally, average the values of all categories:

Three papers solve the problem of Optimization and Evaluation of Semantic Segmentation! Leuven/Tsinghua/Oxford and others jointly proposed a new method

Because not all categories will appear on all photos, so for some combinations of categories and photos, NULL values will appear, as shown in the figure below. When calculating mIoUI, the categories are averaged first and then the photos are averaged, while when mIoUC is calculated, the photos are averaged first and then the categories are averaged.

The result is that mIoUI may be biased towards categories that appear frequently (such as C1 in the figure below), which is generally not good. But on the other hand, when calculating mIoUI, because each photo has an IoU value, this can help us do some auditing and analysis of the model and data set.

Picture Three papers solve the problem of Optimization and Evaluation of Semantic Segmentation! Leuven/Tsinghua/Oxford and others jointly proposed a new method

Worst case evaluation index

For some application scenarios that pay great attention to security, we are often more concerned about the worst-case segmentation quality, and one benefit of fine-grained indicators is that they can calculate the corresponding worst-case indicators. Let's take mIoUC as an example. A similar method can also calculate the corresponding worst-case indicator of mIoUI.

For each category c, we first sort the IoU values of all the photos it has appeared in (assuming there are Ic such photos) in ascending order. Next, we set q to be a small number, such as 1 or 5. Then, we only use the top Ic * q% of the sorted photos to calculate the final value:

Three papers solve the problem of Optimization and Evaluation of Semantic Segmentation! Leuven/Tsinghua/Oxford and others jointly proposed a new method Pictures

After having the value of each class c, we can average by class as before to get the worst-case indicator of mIoUC.

Experimental results

We trained 15 models on 12 data sets and discovered the following phenomena.

Phenomenon 1: No model can achieve the best results on all evaluation indicators. Each evaluation index has a different focus, so we need to consider multiple evaluation indexes at the same time to conduct a comprehensive evaluation.

Phenomenon 2: There are some photos in some data sets that cause almost all models to achieve a very low IoU value. This is partly because the photos themselves are very challenging, such as some very small objects and strong contrast between light and dark, and partly because there are problems with the labels of these photos. Therefore, fine-grained evaluation metrics can help us conduct model audits (finding scenarios where models make mistakes) and dataset audits (finding wrong labels).

Phenomenon 3: The structure of the neural network plays a crucial role in optimizing fine-grained evaluation indicators. On the one hand, the improvement in the receptive field brought by structures such as ASPP (adopted by DeepLabV3 and DeepLabV3) can help the model recognize large-sized objects, thereby effectively improving the value of mIoUD; on the other hand, the gap between encoder and decoder Long connections (adopted by UNet and DeepLabV3) enable the model to recognize small-sized objects, thereby improving the value of fine-grained evaluation indicators.

Phenomenon 4: The value of the worst-case indicator is far lower than the value of the corresponding average indicator. The following table shows the mIoUC and corresponding worst-case indicator values of DeepLabV3-ResNet101 on multiple data sets. A question worth considering in the future is, how should we design the neural network structure and optimization method to improve the model's performance under the worst-case indicators?

Three papers solve the problem of Optimization and Evaluation of Semantic Segmentation! Leuven/Tsinghua/Oxford and others jointly proposed a new method Picture

Phenomenon 5: Loss function is crucial to optimizing fine-grained evaluation indicators role. Compared with the Cross Entropy loss benchmark, as shown in (0, 0, 0) in the following table, when the evaluation indicators become fine-grained, using the corresponding loss function can greatly improve the model's performance on fine-grained evaluation indicators. For example, on ADE20K, the difference in mIoUC loss between JML and Cross Entropy will be greater than 7%.

Three papers solve the problem of Optimization and Evaluation of Semantic Segmentation! Leuven/Tsinghua/Oxford and others jointly proposed a new method Picture

Future work

We only considered JDT loss as semantics loss functions for segmentation, but they can also be applied to other tasks, such as traditional classification tasks.

Secondly, JDT losses are only used in label space, but we believe that they can be used to minimize the distance between any two vectors in feature space, for example, to replace Lp module and cosine distance.

References:

https://arxiv.org/pdf/2302.05666.pdf

https://arxiv.org/pdf/ 2303.16296.pdf

https://arxiv.org/pdf/2310.19252.pdf

The above is the detailed content of Three papers solve the problem of 'Optimization and Evaluation of Semantic Segmentation'! Leuven/Tsinghua/Oxford and others jointly proposed a new method. For more information, please follow other related articles on the PHP Chinese website!

Statement

This article is reproduced at:51CTO.COM. If there is any infringement, please contact admin@php.cn delete

What is Few-Shot Prompting? - Analytics VidhyaApr 22, 2025 am 09:13 AM

Few-Shot Prompting: A Powerful Technique in Machine Learning In the realm of machine learning, achieving accurate responses with minimal data is paramount. Few-shot prompting offers a highly effective solution, enabling AI models to perform specific

What is Temperature in prompt engineering? - Analytics VidhyaApr 22, 2025 am 09:11 AM

Prompt Engineering: Mastering the "Temperature" Parameter for AI Text Generation Prompt engineering is crucial when working with large language models (LLMs) like GPT-4. A key parameter in prompt engineering is "temperature," whi

Are You At Risk Of AI Agency Decay? Take The Test To Find OutApr 21, 2025 am 11:31 AM

This article explores the growing concern of "AI agency decay"—the gradual decline in our ability to think and decide independently. This is especially crucial for business leaders navigating the increasingly automated world while retainin

How to Build an AI Agent from Scratch? - Analytics VidhyaApr 21, 2025 am 11:30 AM

Ever wondered how AI agents like Siri and Alexa work? These intelligent systems are becoming more important in our daily lives. This article introduces the ReAct pattern, a method that enhances AI agents by combining reasoning an

Revisiting The Humanities In The Age Of AIApr 21, 2025 am 11:28 AM

"I think AI tools are changing the learning opportunities for college students. We believe in developing students in core courses, but more and more people also want to get a perspective of computational and statistical thinking," said University of Chicago President Paul Alivisatos in an interview with Deloitte Nitin Mittal at the Davos Forum in January. He believes that people will have to become creators and co-creators of AI, which means that learning and other aspects need to adapt to some major changes. Digital intelligence and critical thinking Professor Alexa Joubin of George Washington University described artificial intelligence as a “heuristic tool” in the humanities and explores how it changes

Understanding LangChain Agent FrameworkApr 21, 2025 am 11:25 AM

LangChain is a powerful toolkit for building sophisticated AI applications. Its agent architecture is particularly noteworthy, allowing developers to create intelligent systems capable of independent reasoning, decision-making, and action. This expl

What are the Radial Basis Functions Neural Networks?Apr 21, 2025 am 11:13 AM

Radial Basis Function Neural Networks (RBFNNs): A Comprehensive Guide Radial Basis Function Neural Networks (RBFNNs) are a powerful type of neural network architecture that leverages radial basis functions for activation. Their unique structure make

The Meshing Of Minds And Machines Has ArrivedApr 21, 2025 am 11:11 AM

Brain-computer interfaces (BCIs) directly link the brain to external devices, translating brain impulses into actions without physical movement. This technology utilizes implanted sensors to capture brain signals, converting them into digital comman

See all articles