Home > Article > Technology peripherals > You Yang's team obtained new results in the AAAI 2023 Outstanding Paper Award, using a single V100 to train the model 72 times faster
This article is reprinted with the authorization of AI New Media Qubit (public account ID: QbitAI). Please contact the source for reprinting.
Just now, Young Professor You Yang, Ph.D. from UC Berkeley and President of the National University of Singapore, released the latest news——
won the AAAI 2023Outstanding Paper Award(Distinguished Paper)!
The research results increase the training speed of the model by 72 times at one time.
Even netizens sighed after reading the paper:
From 12 hours to 10 minutes, tender cow(you cow)ah!
Dr. You Yang once set the world record for ImageNet and BERT training speed during his studies.
The algorithms he designed are also widely used in technology giants such as Google, Microsoft, Intel, and NVIDIA.
Now, he has returned to China to start his own businessLuchen Technology After a year and a half, what kind of algorithm did he and his team come up with to win such an honor at the top AI conference?
In this study, You Yang’s team proposed an optimization strategyCowClip, which can accelerate the development of the CTR prediction model Batch training.
CTR(click-through rate) The prediction model is a commonly used algorithm in personalized recommendation scenarios.
It usually needs to learn user feedback (clicks, collections, purchases, etc.), and the amount of data generated online every day is unprecedentedly huge.
Therefore, it is crucial to speed up the training of the CTR prediction model.
Generally speaking, batch training is used to increase the training speed, but if the batch size is too large, the accuracy of the model will be reduced.
Through mathematical analysis, the team proved that the learning rate for infrequent features (learning rate for infrequent features) should not be scaled when expanding the batch.
With their proposed CowClip, the batch size can be easily and effectively expanded.
The team successfully expanded the original batch size by testing on 4 CTR prediction models and 2 data sets128 Times, without causing any loss of accuracy.
Especially on DeepFM, CowClip achieves more than 0.1% improvement in AUC by expanding the batch size from 1K to 128K.
And on a single V100 GPU, the training time is shortened from the original 12 hours to just 10 minutes, and the training speed is 72 times.
Currently, the project code is open source. The team says the algorithm is also suitable for tasks such as NLP.
The first author of this article is You Yang’s doctoral student Zheng Zangwei. He graduated from the Computer Elite Class of Nanjing University with a bachelor’s degree and a Ph.D. from the National University of Singapore.
His research directions include machine learning, computer vision and high-performance computing.
The above is the detailed content of You Yang's team obtained new results in the AAAI 2023 Outstanding Paper Award, using a single V100 to train the model 72 times faster. For more information, please follow other related articles on the PHP Chinese website!