The first version of this article was written in May 2018, and it was recently published in December 2022. I have received a lot of support and understanding from my bosses over the past four years.
(This experience also hopes to give some encouragement to students who are submitting papers. If you write the paper well, you will definitely win. Don’t give up easily!)
arXiv The early version is: Query Attack via Opposite-Direction Feature:Towards Robust Image Retrieval
##paper Link: https://link.springer.com/article/10.1007/s11263-022-01737-y
Paper backup link: https://zdzheng .xyz/files/IJCV_Retrieval_Robustness_CameraReady.pdf
Code: https://github.com/layumi/U_turn
Authors: Zhedong Zheng, Liang Zheng, Yi Yang and Fei Wu
Compared with earlier versions,
- We have made some adjustments to the formula;
- have added many new related works discussions;
- has been added Multi-scale Query attack/black box attack/defense experiments from three different angles;
- Add new methods and comparisons on Food256, Market-1501, CUB, Oxford, Paris and other data sets New ways to visualize.
- Attacked the PCB structure in reid and WiderResNet in Cifar10.
In actual use. For example, for example, we want to attack the image retrieval system of Google or Baidu to make big news (fog). We can download an image of a dog, calculate the features through the imagenet model (or other models, preferably a model close to the retrieval system), and calculate the adversarial noise plus by turning the features around (the method in this article). Back to the dog. Then use the image search function for the dog after the attack. You can see that Baidu and Google’s system cannot return dog-related content. Although we humans can still recognize that this is an image of a dog.
P.S. At that time, I also tried to attack Google to search for images. People can still recognize that it is an image of a dog, but Google often returns "mosaic" related images. I estimate that Google does not all use deep features, or it is quite different from the imagenet model. As a result, after an attack, it often tends to be "mosaic" instead of other entity categories (airplanes and the like). Of course mosaic can be considered a success to some extent!
What#1. The original intention of this article is actually very simple. The existing reid model or landscape retrieval model has reached a Recall-1 recall rate of more than 95%. So can we design a way to attack the retrieval model? On the one hand, let’s explore the background of the REID model. On the other hand, the attack is for better defense. Let’s study the defense anomaly case.
2. The difference between the retrieval model and the traditional classification model is that the retrieval model uses extracted features to compare the results (sorting), which is quite different from the traditional classification model. , as shown in the table below.
3. Another characteristic of the retrieval problem is open set, which means that the categories during testing are often not the same as those during training. seen. If you are familiar with the cub data set, under the retrieval setting, there are more than 100 kinds of birds in the training set during training, and more than 100 kinds of birds in the test set. There are no overlap types in these two 100 kinds. Matching and ranking rely purely on extracted visual features. Therefore, some classification attack methods are not suitable for attacking the retrieval model, because the graident based on category prediction during the attack is often inaccurate.
4. When testing the retrieval model, there are two parts of data: one is the query image query, and the other is the image library gallery (the amount of data is large and generally inaccessible). Considering the practical feasibility, our method will mainly target the image of the attack query to cause wrong retrieval results.
How
1. A natural idea is to attack features. So how to attack features? Based on our previous observations on cross entropy loss, (please refer to the article large-margin softmax loss). Often when we use classification loss, feature f will have a radial distribution. This is because the cos similarity is calculated between the feature and the weight W of the last classification layer during learning. As shown in the figure below, after we finish learning the model, samples of the same class will be distributed near W of that class, so that f*W can reach the maximum value.
2. So we proposed a very simple method, which is to make the features turn around. As shown in the figure below, there are actually two common classification attack methods that can also be visualized together. For example (a), this is to suppress the category with the highest classification probability (such as Fast Gradient), by giving -Wmax, so there is a red gradient propagation direction along the inverse Wmax; as (b), there is another way to suppress the least likely category. Features of possible categories are pulled up (such as Least-likely), so the red gradient is along Wmin.
3. These two classification attack methods are of course very direct and effective in traditional classification problems. However, since the test sets in the retrieval problem are all unseen categories (unseen bird species), the distribution of natural f does not closely fit Wmax or Wmin. Therefore, our strategy is very simple. Since we have f, then We can just move f to -f, as shown in Figure (c).
In this way, in the feature matching stage, the results that originally ranked high will, ideally, be ranked lowest when calculated as cos similarity with -f, changing from close to 1 to close to -1.
Achieved the effect of our attack retrieval sorting.
4. A small extension. In retrieval problems, we also often use multi-scale for query augmentation, so we also studied how to maintain the attack effect in this case. (The main difficulty is that the resize operation may smooth some small but critical jitters.)
In fact, our method of dealing with it is also very simple. Just like the model ensemble, we combine multiple Just make the ensemble average of the scale's adversarial gradient.
Experiment
1. Under three data sets and three indicators, we fixed the jitter amplitude, which is the epsilon of the abscissa, and compared which one under the same jitter amplitude. One method can make the retrieval model make more mistakes. Our method is that the yellow lines are all at the bottom, which means the attack effect is better.
2. At the same time, we also provide quantitative experimental results on 5 data sets (Food, CUB, Market, Oxford, Paris)
3. In order to demonstrate the mechanism of the model, we also tried to attack the classification model on Cifar10.
You can see that our strategy of changing the characteristics of the last layer also has strong suppression power against the top-5. For top-1, since there is no candidate category, it will be slightly lower than least-likely, but it is almost the same.
4. Black box attack
We also tried to use the attack sample generated by ResNet50 to attack A black-box DenseNet model (the parameters of this model are not available to us). It is found that better migration attack capabilities can also be achieved.
5. Counter Defense
We use online adversarial training to train a defense model. We found that it is still ineffective when accepting new white-box attacks, but it is more stable in small jitters (drops less points) than a completely defenseless model.
6. Visualization of feature movement
This is also my favorite experiment. We use Cifar10 to change the dimension of the last classification layer to 2 to plot the changes in features of the classification layer.
As shown in the figure below, as the jitter amplitude epsilon increases, we can see that the characteristics of the sample slowly "turn around". For example, most of the orange features have moved to the opposite side.
The above is the detailed content of The reversal feature makes the re-id model go from 88.54% to 0.15%. For more information, please follow other related articles on the PHP Chinese website!

1 前言在发布DALL·E的15个月后,OpenAI在今年春天带了续作DALL·E 2,以其更加惊艳的效果和丰富的可玩性迅速占领了各大AI社区的头条。近年来,随着生成对抗网络(GAN)、变分自编码器(VAE)、扩散模型(Diffusion models)的出现,深度学习已向世人展现其强大的图像生成能力;加上GPT-3、BERT等NLP模型的成功,人类正逐步打破文本和图像的信息界限。在DALL·E 2中,只需输入简单的文本(prompt),它就可以生成多张1024*1024的高清图像。这些图像甚至

“Making large models smaller”这是很多语言模型研究人员的学术追求,针对大模型昂贵的环境和训练成本,陈丹琦在智源大会青源学术年会上做了题为“Making large models smaller”的特邀报告。报告中重点提及了基于记忆增强的TRIME算法和基于粗细粒度联合剪枝和逐层蒸馏的CofiPruning算法。前者能够在不改变模型结构的基础上兼顾语言模型困惑度和检索速度方面的优势;而后者可以在保证下游任务准确度的同时实现更快的处理速度,具有更小的模型结构。陈丹琦 普

Wav2vec 2.0 [1],HuBERT [2] 和 WavLM [3] 等语音预训练模型,通过在多达上万小时的无标注语音数据(如 Libri-light )上的自监督学习,显著提升了自动语音识别(Automatic Speech Recognition, ASR),语音合成(Text-to-speech, TTS)和语音转换(Voice Conversation,VC)等语音下游任务的性能。然而这些模型都没有公开的中文版本,不便于应用在中文语音研究场景。 WenetSpeech [4] 是

由于复杂的注意力机制和模型设计,大多数现有的视觉 Transformer(ViT)在现实的工业部署场景中不能像卷积神经网络(CNN)那样高效地执行。这就带来了一个问题:视觉神经网络能否像 CNN 一样快速推断并像 ViT 一样强大?近期一些工作试图设计 CNN-Transformer 混合架构来解决这个问题,但这些工作的整体性能远不能令人满意。基于此,来自字节跳动的研究者提出了一种能在现实工业场景中有效部署的下一代视觉 Transformer——Next-ViT。从延迟 / 准确性权衡的角度看,

3月27号,Stability AI的创始人兼首席执行官Emad Mostaque在一条推文中宣布,Stable Diffusion XL 现已可用于公开测试。以下是一些事项:“XL”不是这个新的AI模型的官方名称。一旦发布稳定性AI公司的官方公告,名称将会更改。与先前版本相比,图像质量有所提高与先前版本相比,图像生成速度大大加快。示例图像让我们看看新旧AI模型在结果上的差异。Prompt: Luxury sports car with aerodynamic curves, shot in a

人工智能就是一个「拼财力」的行业,如果没有高性能计算设备,别说开发基础模型,就连微调模型都做不到。但如果只靠拼硬件,单靠当前计算性能的发展速度,迟早有一天无法满足日益膨胀的需求,所以还需要配套的软件来协调统筹计算能力,这时候就需要用到「智能计算」技术。最近,来自之江实验室、中国工程院、国防科技大学、浙江大学等多达十二个国内外研究机构共同发表了一篇论文,首次对智能计算领域进行了全面的调研,涵盖了理论基础、智能与计算的技术融合、重要应用、挑战和未来前景。论文链接:https://spj.scien

译者 | 李睿审校 | 孙淑娟近年来, Transformer 机器学习模型已经成为深度学习和深度神经网络技术进步的主要亮点之一。它主要用于自然语言处理中的高级应用。谷歌正在使用它来增强其搜索引擎结果。OpenAI 使用 Transformer 创建了著名的 GPT-2和 GPT-3模型。自从2017年首次亮相以来,Transformer 架构不断发展并扩展到多种不同的变体,从语言任务扩展到其他领域。它们已被用于时间序列预测。它们是 DeepMind 的蛋白质结构预测模型 AlphaFold

说起2010年南非世界杯的最大网红,一定非「章鱼保罗」莫属!这只位于德国海洋生物中心的神奇章鱼,不仅成功预测了德国队全部七场比赛的结果,还顺利地选出了最终的总冠军西班牙队。不幸的是,保罗已经永远地离开了我们,但它的「遗产」却在人们预测足球比赛结果的尝试中持续存在。在艾伦图灵研究所(The Alan Turing Institute),随着2022年卡塔尔世界杯的持续进行,三位研究员Nick Barlow、Jack Roberts和Ryan Chan决定用一种AI算法预测今年的冠军归属。预测模型图


Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

AI Hentai Generator
Generate AI Hentai for free.

Hot Article

Hot Tools

mPDF
mPDF is a PHP library that can generate PDF files from UTF-8 encoded HTML. The original author, Ian Back, wrote mPDF to output PDF files "on the fly" from his website and handle different languages. It is slower than original scripts like HTML2FPDF and produces larger files when using Unicode fonts, but supports CSS styles etc. and has a lot of enhancements. Supports almost all languages, including RTL (Arabic and Hebrew) and CJK (Chinese, Japanese and Korean). Supports nested block-level elements (such as P, DIV),

SublimeText3 Linux new version
SublimeText3 Linux latest version

Notepad++7.3.1
Easy-to-use and free code editor

PhpStorm Mac version
The latest (2018.2.1) professional PHP integrated development tool

Dreamweaver CS6
Visual web development tools
