Home > Article > Technology peripherals > Transformer's 6th Anniversary: Even the NeurIPS Oral was not obtained back then, but 8 authors have founded several AI unicorns
From ChatGPT to AI drawing technology, this recent wave of breakthroughs in the field of artificial intelligence may be thanks to Transformer.
Today is the sixth anniversary of the submission of the famous transformer paper.
## Paper link: https://arxiv.org/abs/1706.03762
Six years ago, a paper with a somewhat exaggerated name was uploaded to the preprint paper platform arXiv. The phrase "xx is All You Need" was repeatedly repeated by developers in the AI field. , has even become a trend in paper titles, and Transformer no longer means Transformers, it now represents the most advanced technology in the field of AI.
Six years later, looking back at this paper, we can find many interesting or little-known aspects, as NVIDIA AI scientist Jim Fan summarized.
"Attention mechanism" is not what the author of Transformer proposedThe Transformer model abandons the tradition Of CNN and RNN units, the entire network structure is entirely composed of attention mechanisms.
Although the name of the Transformer paper is "Attention is All You Need", and we continue to praise the attention mechanism because of it, please note an interesting fact: it is not Transformer's research Researchers invented attention, but they pushed this mechanism to the extreme.
Attention Mechanism was proposed in 2014 by a team led by deep learning pioneer Yoshua Bengio:
"Neural Machine Translation by Jointly Learning to Align and Translate", the title is relatively simple.
In this ICLR 2015 paper, Bengio et al. proposed a combination of RNN "context vectors" (i.e., attention). Although it is one of the greatest milestones in the field of NLP, it is much less well-known than transformer. The Bengio team's paper has been cited 29,000 times so far, and Transformer has 77,000 times.
AI’s attention mechanism is naturally modeled after human visual attention. The human brain has an innate ability: when we look at a picture, we first quickly scan the picture and then focus on the target area that needs to be focused on.
If you don’t let go of any local information, you will inevitably do a lot of useless work, which is not conducive to survival. Likewise, introducing similar mechanisms into deep learning networks can simplify models and speed up calculations. In essence, Attention is to filter out a small amount of important information from a large amount of information, and focus on this important information, while ignoring most of the unimportant information.
In recent years, attention mechanisms have been widely used in various fields of deep learning, such as for capturing receptive fields on images in the direction of computer vision, or for locating key tokens in NLP. or characteristics. A large number of experiments have proven that models with attention mechanisms have achieved significant performance improvements in tasks such as image classification, segmentation, tracking, and enhancement, as well as natural language recognition, understanding, question answering, and translation.
The Transformer model that introduces the attention mechanism can be regarded as a general-purpose sequence computer. The attention mechanism allows the model to assign different assignments based on the correlation of different positions in the sequence when processing the input sequence. The attention weight allows the Transformer to capture long-distance dependencies and contextual information, thereby improving the effect of sequence processing.
But at that time, neither Transformer nor the original attention paper talked about universal sequence computers. Instead, the authors see it as a mechanism for solving a narrow and specific problem—machine translation. So in the future, when we trace the origin of AGI, we may be able to trace it back to the "humble" Google Translate.
Transformer Although this paper is very influential now, it was not the top AI in the world that year At the conference NeurIPS 2017, I didn’t even get an Oral, let alone an award. The conference received a total of 3240 paper submissions that year, of which 678 were selected as conference papers. The Transformer paper was one of the accepted papers. Among these papers, 40 were Oral papers, 112 were Spotlight papers, and 3 were the best. Thesis, a Test of time award, Transformer is not eligible for the award.
Although it missed the NeurIPS 2017 paper award, the influence of Transformer is obvious to all.
Jim Fan commented: It is not the fault of the judges that it is difficult for people to realize the importance of an influential study before it becomes influential. However, there are also papers that are lucky enough to be discovered immediately. For example, ResNet proposed by He Yuming and others won the best paper of CVPR 2016. This research is well-deserved and has been correctly recognized by the top AI conference. But at the moment in 2017, very smart researchers may not be able to predict the changes brought about by LLM now. Just like in the 1980s, few people could foresee the tsunami brought about by deep learning since 2012.
There were 8 authors of this paper at that time. They were from Google and the University of Toronto. Five years later, most of them The authors of the paper have all left their original institutions.
On April 26, 2022, a company called "Adept" was officially established. There are 9 co-founders, including Ashish Vaswani, two of the authors of the Transformer paper. and Niki Parmar.
##Ashish Vaswani Obtained a PhD from the University of Southern California, where he studied under the tutelage of Chinese scholars David Chiang and Liang Huang. He mainly studied the early applications of modern deep learning in language modeling. In 2016, he joined Google Brain and led Transformer research before leaving Google in 2021.
Niki Parmar graduated from the University of Southern California with a master's degree and joined Google in 2016. While there, she developed some successful Q&A and text similarity models for Google search and ads. She led early work extending the Transformer model into areas such as image generation, computer vision, and more. In 2021, she also left Google.
After leaving, the two co-founded Adept and served as chief scientist (Ashish Vaswani) and chief technology officer (Niki Parmar) respectively. Adept’s vision is to create an AI called an “artificial intelligence teammate” that is trained to use a variety of different software tools and APIs.
In March 2023, Adept announced the completion of a US$350 million Series B financing. The company’s valuation exceeded US$1 billion, making it a unicorn. However, by the time Adept raised funds publicly, Niki Parmar and Ashish Vaswani had left Adept and founded their own new AI company. However, this new company is still confidential and we are unable to obtain detailed information about the company.
Another paper author Noam Shazeer is one of Google’s most important early employees. He joined Google at the end of 2000 until he finally left in 2021, and then became the CEO of a start-up called "Character.AI".
In addition to Noam Shazeer, the founder of Character.AI is Daniel De Freitas, both of whom are from Google’s LaMDA team. Previously, they built LaMDA, a language model that supports conversational programs at Google.
In March this year, Character.AI announced the completion of US$150 million in financing, with a valuation reaching US$1 billion. It is one of the few startups that has the potential to compete with OpenAI, the organization to which ChatGPT belongs. , is also a rare company that grew into a unicorn in only 16 months. Its application, Character.AI, is a neural language model chatbot that can generate human-like text responses and engage in contextual conversations.
Character.AI was released on the Apple App Store and Google Play Store on May 23, 2023, and was downloaded more than 1.7 million times in its first week. In May 2023, the service added a $9.99 per month paid subscription called c.ai, which allows users priority chat access, faster response times and early access to new features, among other perks.
Aidan N. Gomez Left as early as 2019 Google, then worked as a researcher at FOR.ai, and is now co-founder and CEO of Cohere.
Cohere is a generative AI startup founded in 2019. Its core business includes providing NLP models and helping enterprises improve human-computer interaction. The three founders are Ivan Zhang, Nick Frosst and Aidan Gomez, among whom Gomez and Frosst are former members of the Google Brain team. In November 2021, Google Cloud announced that they would be partnering with Cohere, with Google Cloud using its robust infrastructure to power the Cohere platform, and Cohere using Cloud's TPUs to develop and deploy its products.
It is worth noting that Cohere has just received US$270 million in Series C financing, becoming a unicorn with a market capitalization of US$2.2 billion.
Łukasz KaiserLeaving Google in 2021 to work at Google I have been working for 7 years and 9 months, and now I am a researcher at OpenAI. While working as a research scientist at Google, he participated in the design of SOTA neural models for machine translation, parsing, and other algorithmic and generation tasks. He was a co-author of the TensorFlow system and the Tensor2Tensor library.
Jakob Uszkoreit left Google in 2021 to work at Google After 13 years, he joined Inceptive and became a co-founder. Inceptive is an AI pharmaceutical company dedicated to using deep learning to design RNA drugs.
While working at Google, Jakob Uszkoreit participated in forming the language understanding team of Google Assistant, and also worked on Google Translate in the early days.
Illia Polosukhin left Google in 2017 and is now NEAR Co-founder and CTO of .AI (a blockchain underlying technology company).
The only one who is still at Google is Llion Jones, this year This is his 9th year at Google.
Now, 6 years have passed since the publication of the paper "Attention Is All You Need", and the original authors have Choose to leave, some choose to stay at Google, no matter what, Transformer's influence continues.
The above is the detailed content of Transformer's 6th Anniversary: Even the NeurIPS Oral was not obtained back then, but 8 authors have founded several AI unicorns. For more information, please follow other related articles on the PHP Chinese website!