Transformer's 6th Anniversary: Even the NeurIPS Oral was not obtained back then, but 8 authors have founded several AI unicorns-AI-php.cn

Transformer's 6th Anniversary: Even the NeurIPS Oral was not obtained back then, but 8 authors have founded several AI unicorns

王林

Jun 14, 2023 pm 01:18 PM

aidevelop

Transformer六周年：当年连NeurIPS Oral都没拿到，8位作者已创办数家AI独角兽

From ChatGPT to AI drawing technology, this recent wave of breakthroughs in the field of artificial intelligence may be thanks to Transformer.

Today is the sixth anniversary of the submission of the famous transformer paper.

Transformer六周年：当年连NeurIPS Oral都没拿到，8位作者已创办数家AI独角兽

## Paper link: https://arxiv.org/abs/1706.03762

Six years ago, a paper with a somewhat exaggerated name was uploaded to the preprint paper platform arXiv. The phrase "xx is All You Need" was repeatedly repeated by developers in the AI field. , has even become a trend in paper titles, and Transformer no longer means Transformers, it now represents the most advanced technology in the field of AI.

Six years later, looking back at this paper, we can find many interesting or little-known aspects, as NVIDIA AI scientist Jim Fan summarized.

Transformer六周年：当年连NeurIPS Oral都没拿到，8位作者已创办数家AI独角兽

"Attention mechanism" is not what the author of Transformer proposed

The Transformer model abandons the tradition Of CNN and RNN units, the entire network structure is entirely composed of attention mechanisms.

Although the name of the Transformer paper is "Attention is All You Need", and we continue to praise the attention mechanism because of it, please note an interesting fact: it is not Transformer's research Researchers invented attention, but they pushed this mechanism to the extreme.

Attention Mechanism was proposed in 2014 by a team led by deep learning pioneer Yoshua Bengio:

"Neural Machine Translation by Jointly Learning to Align and Translate", the title is relatively simple.

In this ICLR 2015 paper, Bengio et al. proposed a combination of RNN "context vectors" (i.e., attention). Although it is one of the greatest milestones in the field of NLP, it is much less well-known than transformer. The Bengio team's paper has been cited 29,000 times so far, and Transformer has 77,000 times.

Transformer六周年：当年连NeurIPS Oral都没拿到，8位作者已创办数家AI独角兽

AI’s attention mechanism is naturally modeled after human visual attention. The human brain has an innate ability: when we look at a picture, we first quickly scan the picture and then focus on the target area that needs to be focused on.

If you don’t let go of any local information, you will inevitably do a lot of useless work, which is not conducive to survival. Likewise, introducing similar mechanisms into deep learning networks can simplify models and speed up calculations. In essence, Attention is to filter out a small amount of important information from a large amount of information, and focus on this important information, while ignoring most of the unimportant information.

In recent years, attention mechanisms have been widely used in various fields of deep learning, such as for capturing receptive fields on images in the direction of computer vision, or for locating key tokens in NLP. or characteristics. A large number of experiments have proven that models with attention mechanisms have achieved significant performance improvements in tasks such as image classification, segmentation, tracking, and enhancement, as well as natural language recognition, understanding, question answering, and translation.

The Transformer model that introduces the attention mechanism can be regarded as a general-purpose sequence computer. The attention mechanism allows the model to assign different assignments based on the correlation of different positions in the sequence when processing the input sequence. The attention weight allows the Transformer to capture long-distance dependencies and contextual information, thereby improving the effect of sequence processing.

But at that time, neither Transformer nor the original attention paper talked about universal sequence computers. Instead, the authors see it as a mechanism for solving a narrow and specific problem—machine translation. So in the future, when we trace the origin of AGI, we may be able to trace it back to the "humble" Google Translate.

Although it was accepted by NeurIPS 2017, it didn’t even get an Oral

Transformer Although this paper is very influential now, it was not the top AI in the world that year At the conference NeurIPS 2017, I didn’t even get an Oral, let alone an award. The conference received a total of 3240 paper submissions that year, of which 678 were selected as conference papers. The Transformer paper was one of the accepted papers. Among these papers, 40 were Oral papers, 112 were Spotlight papers, and 3 were the best. Thesis, a Test of time award, Transformer is not eligible for the award.

Although it missed the NeurIPS 2017 paper award, the influence of Transformer is obvious to all.

Jim Fan commented: It is not the fault of the judges that it is difficult for people to realize the importance of an influential study before it becomes influential. However, there are also papers that are lucky enough to be discovered immediately. For example, ResNet proposed by He Yuming and others won the best paper of CVPR 2016. This research is well-deserved and has been correctly recognized by the top AI conference. But at the moment in 2017, very smart researchers may not be able to predict the changes brought about by LLM now. Just like in the 1980s, few people could foresee the tsunami brought about by deep learning since 2012.

Eight authors, each with a wonderful life

There were 8 authors of this paper at that time. They were from Google and the University of Toronto. Five years later, most of them The authors of the paper have all left their original institutions.

On April 26, 2022, a company called "Adept" was officially established. There are 9 co-founders, including Ashish Vaswani, two of the authors of the Transformer paper. and Niki Parmar.

Transformer六周年：当年连NeurIPS Oral都没拿到，8位作者已创办数家AI独角兽

##Ashish Vaswani Obtained a PhD from the University of Southern California, where he studied under the tutelage of Chinese scholars David Chiang and Liang Huang. He mainly studied the early applications of modern deep learning in language modeling. In 2016, he joined Google Brain and led Transformer research before leaving Google in 2021.

Niki Parmar graduated from the University of Southern California with a master's degree and joined Google in 2016. While there, she developed some successful Q&A and text similarity models for Google search and ads. She led early work extending the Transformer model into areas such as image generation, computer vision, and more. In 2021, she also left Google.

After leaving, the two co-founded Adept and served as chief scientist (Ashish Vaswani) and chief technology officer (Niki Parmar) respectively. Adept’s vision is to create an AI called an “artificial intelligence teammate” that is trained to use a variety of different software tools and APIs.

In March 2023, Adept announced the completion of a US$350 million Series B financing. The company’s valuation exceeded US$1 billion, making it a unicorn. However, by the time Adept raised funds publicly, Niki Parmar and Ashish Vaswani had left Adept and founded their own new AI company. However, this new company is still confidential and we are unable to obtain detailed information about the company.

Another paper author Noam Shazeer is one of Google’s most important early employees. He joined Google at the end of 2000 until he finally left in 2021, and then became the CEO of a start-up called "Character.AI".

In addition to Noam Shazeer, the founder of Character.AI is Daniel De Freitas, both of whom are from Google’s LaMDA team. Previously, they built LaMDA, a language model that supports conversational programs at Google.

In March this year, Character.AI announced the completion of US$150 million in financing, with a valuation reaching US$1 billion. It is one of the few startups that has the potential to compete with OpenAI, the organization to which ChatGPT belongs. , is also a rare company that grew into a unicorn in only 16 months. Its application, Character.AI, is a neural language model chatbot that can generate human-like text responses and engage in contextual conversations.

Character.AI was released on the Apple App Store and Google Play Store on May 23, 2023, and was downloaded more than 1.7 million times in its first week. In May 2023, the service added a $9.99 per month paid subscription called c.ai, which allows users priority chat access, faster response times and early access to new features, among other perks.

Transformer六周年：当年连NeurIPS Oral都没拿到，8位作者已创办数家AI独角兽

Aidan N. Gomez Left as early as 2019 Google, then worked as a researcher at FOR.ai, and is now co-founder and CEO of Cohere.

Cohere is a generative AI startup founded in 2019. Its core business includes providing NLP models and helping enterprises improve human-computer interaction. The three founders are Ivan Zhang, Nick Frosst and Aidan Gomez, among whom Gomez and Frosst are former members of the Google Brain team. In November 2021, Google Cloud announced that they would be partnering with Cohere, with Google Cloud using its robust infrastructure to power the Cohere platform, and Cohere using Cloud's TPUs to develop and deploy its products.

It is worth noting that Cohere has just received US$270 million in Series C financing, becoming a unicorn with a market capitalization of US$2.2 billion.

Transformer六周年：当年连NeurIPS Oral都没拿到，8位作者已创办数家AI独角兽

Łukasz KaiserLeaving Google in 2021 to work at Google I have been working for 7 years and 9 months, and now I am a researcher at OpenAI. While working as a research scientist at Google, he participated in the design of SOTA neural models for machine translation, parsing, and other algorithmic and generation tasks. He was a co-author of the TensorFlow system and the Tensor2Tensor library.

Transformer六周年：当年连NeurIPS Oral都没拿到，8位作者已创办数家AI独角兽

Jakob Uszkoreit left Google in 2021 to work at Google After 13 years, he joined Inceptive and became a co-founder. Inceptive is an AI pharmaceutical company dedicated to using deep learning to design RNA drugs.

While working at Google, Jakob Uszkoreit participated in forming the language understanding team of Google Assistant, and also worked on Google Translate in the early days.

Transformer六周年：当年连NeurIPS Oral都没拿到，8位作者已创办数家AI独角兽

Illia Polosukhin left Google in 2017 and is now NEAR Co-founder and CTO of .AI (a blockchain underlying technology company).

Transformer六周年：当年连NeurIPS Oral都没拿到，8位作者已创办数家AI独角兽

The only one who is still at Google is Llion Jones, this year This is his 9th year at Google.

Transformer六周年：当年连NeurIPS Oral都没拿到，8位作者已创办数家AI独角兽

Now, 6 years have passed since the publication of the paper "Attention Is All You Need", and the original authors have Choose to leave, some choose to stay at Google, no matter what, Transformer's influence continues.

The above is the detailed content of Transformer's 6th Anniversary: Even the NeurIPS Oral was not obtained back then, but 8 authors have founded several AI unicorns. For more information, please follow other related articles on the PHP Chinese website!

Statement

This article is reproduced at:51CTO.COM. If there is any infringement, please contact admin@php.cn delete

How to Run LLM Locally Using LM Studio? - Analytics VidhyaApr 19, 2025 am 11:38 AM

Running large language models at home with ease: LM Studio User Guide In recent years, advances in software and hardware have made it possible to run large language models (LLMs) on personal computers. LM Studio is an excellent tool to make this process easy and convenient. This article will dive into how to run LLM locally using LM Studio, covering key steps, potential challenges, and the benefits of having LLM locally. Whether you are a tech enthusiast or are curious about the latest AI technologies, this guide will provide valuable insights and practical tips. Let's get started! Overview Understand the basic requirements for running LLM locally. Set up LM Studi on your computer

Guy Peri Helps Flavor McCormick's Future Through Data TransformationApr 19, 2025 am 11:35 AM

Guy Peri is McCormick’s Chief Information and Digital Officer. Though only seven months into his role, Peri is rapidly advancing a comprehensive transformation of the company’s digital capabilities. His career-long focus on data and analytics informs

What is the Chain of Emotion in Prompt Engineering? - Analytics VidhyaApr 19, 2025 am 11:33 AM

Introduction Artificial intelligence (AI) is evolving to understand not just words, but also emotions, responding with a human touch. This sophisticated interaction is crucial in the rapidly advancing field of AI and natural language processing. Th

12 Best AI Tools for Data Science Workflow - Analytics VidhyaApr 19, 2025 am 11:31 AM

Introduction In today's data-centric world, leveraging advanced AI technologies is crucial for businesses seeking a competitive edge and enhanced efficiency. A range of powerful tools empowers data scientists, analysts, and developers to build, depl

AV Byte: OpenAI's GPT-4o Mini and Other AI InnovationsApr 19, 2025 am 11:30 AM

This week's AI landscape exploded with groundbreaking releases from industry giants like OpenAI, Mistral AI, NVIDIA, DeepSeek, and Hugging Face. These new models promise increased power, affordability, and accessibility, fueled by advancements in tr

Perplexity's Android App Is Infested With Security Flaws, Report FindsApr 19, 2025 am 11:24 AM

But the company’s Android app, which offers not only search capabilities but also acts as an AI assistant, is riddled with a host of security issues that could expose its users to data theft, account takeovers and impersonation attacks from malicious

Everyone's Getting Better At Using AI: Thoughts On Vibe CodingApr 19, 2025 am 11:17 AM

You can look at what’s happening in conferences and at trade shows. You can ask engineers what they’re doing, or consult with a CEO. Everywhere you look, things are changing at breakneck speed. Engineers, and Non-Engineers What’s the difference be

Rocket Launch Simulation and Analysis using RocketPy - Analytics VidhyaApr 19, 2025 am 11:12 AM

Simulate Rocket Launches with RocketPy: A Comprehensive Guide This article guides you through simulating high-power rocket launches using RocketPy, a powerful Python library. We'll cover everything from defining rocket components to analyzing simula

See all articles