The research was questioned, Jeff Dean responded: We were not trying to get new SOTA, and the cost calculation was also wrong.-AI-php.cn

The research was questioned, Jeff Dean responded: We were not trying to get new SOTA, and the cost calculation was also wrong.

WBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWB

Apr 08, 2023 pm 04:21 PM

Googleaipaper

Yesterday, the most popular topic in the entire community was nothing more than a machine learning researcher on reddit questioning the participation of Google AI leader Jeff Dean in the paper. The paper, "An Evolutionary Approach to Dynamic Introduction of Tasks in Large-scale Multitask Learning Systems," was submitted to the preprint paper platform arXiv on Thursday. The research was questioned, Jeff Dean responded: We were not trying to get new SOTA, and the cost calculation was also wrong.

In the paper, Jeff Dean et al. proposed an evolutionary algorithm that can generate large-scale multi-task models, while also supporting the dynamic and continuous addition of new tasks. The generated multi-task model is sparsely activated and integrated with task-based routing. The new method achieves competitive results on 69 image classification tasks, such as achieving a new industry-high recognition accuracy of 99.43% on CIFAR-10 for a model trained only on public data.

The research was questioned, Jeff Dean responded: We were not trying to get new SOTA, and the cost calculation was also wrong.

It is this new SOTA implemented on CIFAR-10 that has been questioned, the previous SOTA was 99.40. "Producing this result required a total of 17,810 TPU core hours," she said. "If you don't work at Google, this means you have to use on-demand payment of $3.22/hour and the trained model costs $57,348."

Therefore, she asked her soul, "Jeff Dean spent enough money to support a family of four for five years, achieved a 0.03% improvement on CIFAR-10, and created a new SOTA. It was all worth it. ?"

This question has been echoed by many people in the field. Some researchers even said pessimistically, "I have almost lost interest in deep learning. As a practitioner in a small laboratory, it is basically impossible to compete with the technology giants in terms of computing budget. Even if you have a good theoretical idea, There may also be biases in the mainstream environment that make it difficult to see the light of day. This creates an unfair playing field."

As the topic continued to ferment, Jeff Dean personally responded on reddit. He said, "The goal of our research is not to obtain a higher-quality cifar10 model, and there are also problems with the cost calculation method of the original author."

The research was questioned, Jeff Dean responded: We were not trying to get new SOTA, and the cost calculation was also wrong.

Jeff Dean Full text of response

This paper was completed by Andrea Gesmundo and I, and Andrea Gesmundo did most of the work on the paper.

The research was questioned, Jeff Dean responded: We were not trying to get new SOTA, and the cost calculation was also wrong.

Paper address: https://arxiv.org/pdf/2205.12755.pdf

What I want to say is that the goal of this research is not to get A high quality cifar10 model. Rather, this study explores a setting that can dynamically introduce new tasks into a running system and successfully obtain a high-quality model for the new task that will reuse representations from existing models and sparsely New parameters are introduced while avoiding multi-task system problems such as catastrophic forgetting or negative migration.

The experiments of this study show that we can dynamically introduce 69 different task streams from several independent visualization task benchmarks, ultimately resulting in a multi-task system that can jointly produce high-quality images for all these tasks. solution. The resulting model is sparsely activated for any given task, with the system introducing fewer and fewer new parameters for new tasks (see Figure 2 below). The multitasking system introduced only 1.4% new parameters for the incremental tasks at the end of this task stream, with each task activating an average of 2.3% of the total parameters of the model. There is considerable representation sharing between tasks, and the evolution process helps determine when it makes sense and when new trainable parameters should be introduced for new tasks.

The research was questioned, Jeff Dean responded: We were not trying to get new SOTA, and the cost calculation was also wrong.

I also think that the author of the original post calculated the cost wrong. The experiment was to train a multi-task model to jointly solve 69 tasks instead of training a cifar10 model. As you can see from Table 7 below, the calculations used are a mix of TPUv3 cores and TPUv4 cores, so core hours cannot be simply calculated as they are priced differently.

Unless you have a particularly urgent task and need to quickly train cifar10 68 tasks, in fact, this type of research can easily use resources with preemptive prices, namely $0.97/hour TPUv4, $0.60/hour TPUv3 (not what they call Says you have on-demand pricing of $3.22/hour). Under these assumptions, the compute public cloud cost described in Table 7 is approximately $13,960 (using preemptible prices of 12,861 TPUv4 chip-hours and 2,474.5 TPUv3 chip-hours), or approximately $202/task.

The research was questioned, Jeff Dean responded: We were not trying to get new SOTA, and the cost calculation was also wrong.

I think it is important to have models with sparse activations and be able to dynamically introduce new tasks into existing systems that can share representations where appropriate ) and avoid catastrophic forgetting, these studies are at least worth exploring. The system also has the advantage that new tasks can be automatically incorporated into the system without having to be specifically formulated for it (this is what the evolutionary search process does), which seems to be a useful property of a continuously learning system.

The code of this paper is open source and you can view it yourself.

Code address: https://github.com/google-research/google-research/tree/master/muNet

The author of the original post replied to Jeff Dean

The research was questioned, Jeff Dean responded: We were not trying to get new SOTA, and the cost calculation was also wrong.

After seeing Jeff Dean’s reply, the author of the original post said: To clarify, I think Jeff Dean’s paper (the evolutionary model used to generate model expansion in each task) is really useful. Interesting, this reminds me of another paper, but I can't remember the title. It was about adding new modules to the overall architecture for each new task, using the hidden states of other modules as part of the input to each layer. , but does not update the weights of existing components.

I also have an idea to build modules in the model of each task. Do you know how baby deer can walk within minutes of being born? In contrast, at that time, newborn fawns had essentially no "training data" to learn to sense movement or model the world, and instead had to exploit specialized structures in the brain that had to be inherited in order for the fawn to Have basic skills. These structures will be very useful, so in a sense that it will quickly generalize to a new but related control task.

So this paper got me thinking about the development of already existing inheritable structures that can be used to learn new tasks more efficiently.

Researchers in another lab may have the same idea, but get much worse results because they cannot afford to move from their existing setup to a large cloud platform. And, because the community is now overly focused on SOTA results, their research cannot be published. Even though the cost is "only" $202/task, it takes many iterations to get things right.

So, for those of us who don’t have access to a large computing budget, our options are essentially two. One is to pray and hope that Google will publicly distribute the existing model and we can fine-tune it to our needs. But as a result, the model may have learned biases or adversarial weaknesses that we can't eliminate. The second is to do nothing and lie down.

So, my problem is not just with this study. If OpenAI wants to spend hundreds of billions of dollars (figuratively speaking) on GPT-4, then give it more power. This is a scientific and publishing culture that overly rewards glitz, big numbers, and luxury, rather than helping people get better at real work. My favorite paper is "Representation Learning with Contrastive Predictive Coding" by van der Oord in 2019, which uses an unsupervised pre-training task and then supervised training on a small subset of labels to achieve replica-labeled all data. accuracy results, and discuss this improvement from a data efficiency perspective. I reproduced and used these results in my work, saving myself time and money. Just based on this paper, I am willing to become his doctoral student.

However, OpenAI proposed a larger transformer model GPT-3 in the paper "Language Models are Few-Shot Learners", which received nearly 4,000 citations and the NeurIPS 2020 Best Paper Award, and also won the entire media attention.

The above is the detailed content of The research was questioned, Jeff Dean responded: We were not trying to get new SOTA, and the cost calculation was also wrong.. For more information, please follow other related articles on the PHP Chinese website!

Statement

This article is reproduced at:51CTO.COM. If there is any infringement, please contact admin@php.cn delete

What is Graph of Thought in Prompt EngineeringApr 13, 2025 am 11:53 AM

Introduction In prompt engineering, “Graph of Thought” refers to a novel approach that uses graph theory to structure and guide AI’s reasoning process. Unlike traditional methods, which often involve linear s

Optimize Your Organisation's Email Marketing with GenAI AgentsApr 13, 2025 am 11:44 AM

Introduction Congratulations! You run a successful business. Through your web pages, social media campaigns, webinars, conferences, free resources, and other sources, you collect 5000 email IDs daily. The next obvious step is

Real-Time App Performance Monitoring with Apache PinotApr 13, 2025 am 11:40 AM

Introduction In today’s fast-paced software development environment, ensuring optimal application performance is crucial. Monitoring real-time metrics such as response times, error rates, and resource utilization can help main

ChatGPT Hits 1 Billion Users? 'Doubled In Just Weeks' Says OpenAI CEOApr 13, 2025 am 11:23 AM

“How many users do you have?” he prodded. “I think the last time we said was 500 million weekly actives, and it is growing very rapidly,” replied Altman. “You told me that it like doubled in just a few weeks,” Anderson continued. “I said that priv

Pixtral-12B: Mistral AI's First Multimodal Model - Analytics VidhyaApr 13, 2025 am 11:20 AM

Introduction Mistral has released its very first multimodal model, namely the Pixtral-12B-2409. This model is built upon Mistral’s 12 Billion parameter, Nemo 12B. What sets this model apart? It can now take both images and tex

Agentic Frameworks for Generative AI Applications - Analytics VidhyaApr 13, 2025 am 11:13 AM

Imagine having an AI-powered assistant that not only responds to your queries but also autonomously gathers information, executes tasks, and even handles multiple types of data—text, images, and code. Sounds futuristic? In this a

Applications of Generative AI in the Financial SectorApr 13, 2025 am 11:12 AM

Introduction The finance industry is the cornerstone of any country’s development, as it drives economic growth by facilitating efficient transactions and credit availability. The ease with which transactions occur and credit

Guide to Online Learning and Passive-Aggressive AlgorithmsApr 13, 2025 am 11:09 AM

Introduction Data is being generated at an unprecedented rate from sources such as social media, financial transactions, and e-commerce platforms. Handling this continuous stream of information is a challenge, but it offers an

See all articles

Hot AI Tools

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress images for free

Clothoff.io

AI clothes remover

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)

3 weeks agoBy尊渡假赌尊渡假赌尊渡假赌

R.E.P.O. Best Graphic Settings

3 weeks agoBy尊渡假赌尊渡假赌尊渡假赌

Assassin's Creed Shadows: Seashell Riddle Solution

2 weeks agoByDDD

R.E.P.O. How to Fix Audio if You Can't Hear Anyone

3 weeks agoBy尊渡假赌尊渡假赌尊渡假赌

WWE 2K25: How To Unlock Everything In MyRise

4 weeks agoBy尊渡假赌尊渡假赌尊渡假赌

Hot Tools

SublimeText3 Chinese version

Chinese version, very easy to use

mPDF

mPDF is a PHP library that can generate PDF files from UTF-8 encoded HTML. The original author, Ian Back, wrote mPDF to output PDF files "on the fly" from his website and handle different languages. It is slower than original scripts like HTML2FPDF and produces larger files when using Unicode fonts, but supports CSS styles etc. and has a lot of enhancements. Supports almost all languages, including RTL (Arabic and Hebrew) and CJK (Chinese, Japanese and Korean). Supports nested block-level elements (such as P, DIV),

DVWA

Damn Vulnerable Web App (DVWA) is a PHP/MySQL web application that is very vulnerable. Its main goals are to be an aid for security professionals to test their skills and tools in a legal environment, to help web developers better understand the process of securing web applications, and to help teachers/students teach/learn in a classroom environment Web application security. The goal of DVWA is to practice some of the most common web vulnerabilities through a simple and straightforward interface, with varying degrees of difficulty. Please note that this software

Dreamweaver Mac version

Visual web development tools

SecLists

SecLists is the ultimate security tester's companion. It is a collection of various types of lists that are frequently used during security assessments, all in one place. SecLists helps make security testing more efficient and productive by conveniently providing all the lists a security tester might need. List types include usernames, passwords, URLs, fuzzing payloads, sensitive data patterns, web shells, and more. The tester can simply pull this repository onto a new test machine and he will have access to every type of list he needs.

Hot Topics

Where is the login entrance for gmail email?

7486

CakePHP Tutorial

1377

What is the format of the account name of steam

win11 activation key permanent

nyt connections hints and answers