Home  >  Article  >  Technology peripherals  >  Alibaba GPT trained "Niaoniao" mouth substitute in 15 days, which is much more exciting than ChatGPT+Siri!

Alibaba GPT trained "Niaoniao" mouth substitute in 15 days, which is much more exciting than ChatGPT+Siri!

2023-04-07 16:43:041444browse

You may not believe it, but we have been with the talk show actor "Niao Niao" for the past two days.

Alibaba GPT trained Niaoniao mouth substitute in 15 days, which is much more exciting than ChatGPT+Siri!

Let’s ask him, one of the most popular memes in Xiaoguo Culture last year: “Do you know about shrimp boyfriends? "

ta immediately replied: "I heard that the shrimp boyfriend meme is popular online recently. It refers to boys with good figures but not very good-looking faces. They are like shrimps and only need to be removed. The head is edible. This term has become popular recently because talk show actor Li Dan exposed netizens’ whole life on the show."

A certain Dan came with shrimp threads. Looking for you

Listen carefully to this voice, do you think it is a bird?

"Niaoniao" came to Xinzhiyuan?

1. Very individual

Didn’t you hear it? So let us ask a question that can accurately "locate" Bird: Are you an introvert?

After a while, Niaoniao's unique tone slowly sounded: "Yes, I am an introverted girl with social anxiety. I often feel invisible, even on stage. I don’t dare to express myself, but I don’t feel inferior because of this. Instead, I hope to help more people like me through my performance and let them know that deep in our hearts, we also have a unique existence and value. 』

social fear is ok tooaudio

Well, I believe you are What a bird.

Alibaba GPT trained Niaoniao mouth substitute in 15 days, which is much more exciting than ChatGPT+Siri!

It just so happened that Niaoniao went to Milan some time ago, so why don’t we let him share his experience.

"I recently went to Milan, Italy to participate in Fashion Week and participated in a very interesting event. Although I was nervous and my feet hurt, I learned a lot. I also went to see the Milan Cathedral. , very shocking. I think Italy is very similar to my hometown Inner Mongolia, very cordial."

went to Milan to participate in Fashion Weekaudio

2. Composition Master

Sounds very interesting, then Can you write an essay on this topic?

ta is at ease and at ease: "When I walk on the streets of Milan, I feel like I am in a Renaissance world, where there are many ancient and modern buildings that symbolize Milan has a long history of culture and art. I saw many painters creating here. They used their own brushstrokes to depict the beautiful scenery of Milan, which made people feel an unspeakable intoxication."

Milan, unspeakable intoxicationAudio

If you are asked to give this essay a title , after thinking for a moment, he blurted out——

"How to make a trip to Milan the best memory of your life." What a brilliant literary talent!

Hey, the editor suddenly had an idea...

Alibaba GPT trained Niaoniao mouth substitute in 15 days, which is much more exciting than ChatGPT+Siri!

Last year, Niaoniao got the chance The runner-up in a talk show competition impressed the audience with his extremely exquisite text. Then let us ask him: "How to judge the level of a talk show actor?"

ta replied: "First of all, it depends on whether the actor has a strong sense of humor and can create laughter and interest. Secondly, it depends on whether the actor can grasp the rhythm and the overall atmosphere of the show so that the audience can follow the actor's ideas. , and finally it depends on whether the actor can master his own emotions and expressions, and can accurately convey his own opinions and emotions."

On how to evaluate a talk show actor Audio

# Well said, Zhisheng wanted to like it after reading it.

As a talk show actor, how come you don’t get tested by homophonic memes?

We read him a joke: "Xia and Clam both got 100 points in the test. The teacher asked Xia whose copy you copied. Xia said: "I copied from Clam." The teacher said: "What are you so good about?""

He immediately understood and said hahaha.

The homophones don’t bother me, I’m super cool

3. Science Popularization Expert

# Sometimes, some strange questions may suddenly pop into our minds. For example, how far is the earth from the sun?

As soon as he finished speaking, the answer came: "The closest distance is 1.471 times 10 to the 11th power meter, and the farthest distance is 1.521 times 10 to the 11th power meter. The average The distance is 1.496 times 10 to the power of 8 kilometers."

Yes, the answer is very accurate.

How far is the earth from the sun?

Ask one more question: "Why do organisms evolve to be dioecious?"

ta Yes Said: "Dioecy is to avoid the inheritance of genetic defects, because dioecy has more advantages in evolution. Good mutations will gain advantages in survival or reproduction, and bad mutations will be eliminated."

On the necessity of dioecy

Alibaba GPT trained Niaoniao mouth substitute in 15 days, which is much more exciting than ChatGPT+Siri!

##Interesting Yes, this answer is very similar to an answer on Zhihu. It seems that he will search for it himself.

Alibaba GPT trained Niaoniao mouth substitute in 15 days, which is much more exciting than ChatGPT+Siri!

From the Zhihu answerer "Don't be confused and never give up"

There is another question that often puzzles the editor: Why are physical equations often very short?

ta told us: “This is because physical equations often capture the most basic and fundamental physical rules and laws in nature, and describe them in the simplest and clearest way. 》

Alibaba GPT trained Niaoniao mouth substitute in 15 days, which is much more exciting than ChatGPT+Siri!

4. When you emo

March Madness Just after that, I was so tired that I was about to be rolled over, and I just wanted to lie down.

Ask me, how to choose between lying and rolling?

ta replied: "Lying and rolling is not a multiple-choice question, but a discussion of work and rest. (It must be) done reasonably on the right track, with a positive attitude, and with a positive attitude Roll with value."

We must roll reasonably, roll actively, and roll with value

Eh? It smells like Alain de Botton, and the editor can't help but fall into deep thought...

In addition, if you act cute with him and say that you are as sad as a 200-pound orange cat recently ”, he will also console him:

“Oh, it sounds like you are really unhappy. It seems that you need some time to adjust your emotions. I don’t know. Why are you sad? Did you encounter any difficulties? I can share some of my own experiences and thoughts with you, or recommend some interesting variety shows."

不开门, please comfort

5. Will take the initiative to answer the call

even Once, the editors were discussing in the office, "Introversion is a joke often used in NiaoNiao talk shows." She actually took the initiative to answer the conversation -

"Yes, I do This joke is often used in talk shows. Talk shows can best reflect a person’s personality and characteristics. In addition to one’s own stories and experiences, one also needs to have one’s own opinions and attitudes in order to arouse resonance and laughter from the audience..."

The entire editorial department was instantly shocked.

Not only that, if you praise her for her awesomeness after she gives a wonderful answer, she will proudly say: "Well, thank you for the compliment. I just like to answer some weird questions." Weird question."

15 days to train "birds and birds separate birds"

Having said so much, everyone has already guessed that this is an AI.

So, how was this bird bird divided bird born?

Training process

1. Use a brand new Alibaba large model version for basic learning

The first step is to use large-scale language pre-training to do basic learning. This is a hierarchical training method.

This step is also simulating the human learning process, learning simple knowledge first, then learning complex knowledge, and gradually increasing the difficulty.

In this process, a large-scale corpus was used. The model was able to read the text and speak fluently. At the same time, it also learned some general knowledge.

Alibaba GPT trained Niaoniao mouth substitute in 15 days, which is much more exciting than ChatGPT+Siri!

2. Learn to use some tools and acquire the latest knowledge

However, after the first step, the researchers also discovered some problems: A large amount of new knowledge is being produced every day, and what is learned today may be outdated tomorrow. How to do it?

So, instead of letting the big model write down all the knowledge, it is better to learn to use tools and make enough food and clothing by yourself.

Now, Niaoniaofenniao has learned to call the search engine, and it can answer the new questions freely even after the model training is completed.

3. Personalized conversation enhancement: multiple rounds, heuristics

With knowledge enhancement and tools On the basis of enhancement, the third step is to do personalized dialogue enhancement.

In other words, add a "personality" to the bird.

In this process, it needs to learn what multi-round dialogue is and what heuristic dialogue is. The difficulty is that multi-turn conversations often require historical information from long ago.

On the other hand, it is a label word that shapes its personality. At the same time, the researchers also annotated a small amount of Niaoniao's corpus as a personalized enhancement and optimization.

After the third step, this model already looks more like a bird.

Alibaba GPT trained Niaoniao mouth substitute in 15 days, which is much more exciting than ChatGPT+Siri!

4. Enhancement based on human feedback (RLHF)

How to make it more like a bird Woolen cloth? It’s reinforcement learning through human feedback (RLHF).

For the same question, let the model give multiple different answers, the staff will provide feedback and annotation, and then let the model further correct the deviation.

After multiple rounds of iterations, the model’s answers are increasingly able to represent some of Niaoniao’s textual features, and even her specific stance.

Alibaba GPT trained Niaoniao mouth substitute in 15 days, which is much more exciting than ChatGPT+Siri!

Become a product

After training the model, I want to turn it into a complete one Tmall Elf", we still need to complete several aspects of algorithm engineering work:

1. Hear

In order for the model to be successfully transformed into a product, it must be able to hear and understand what the user is saying - speech-to-text.

This process uses the cat-ear algorithm of Tmall Elf.

The characteristic of cat ears is that the error of discrimination is very sensitive, and when sounds are emitted from different places, the ears will rotate independently to accurately locate different sounds.

Alibaba GPT trained Niaoniao mouth substitute in 15 days, which is much more exciting than ChatGPT+Siri!

Alibaba GPT trained Niaoniao mouth substitute in 15 days, which is much more exciting than ChatGPT+Siri!

There are two problems that are focused on solving in the Mao Er algorithm.

The first one is echo cancellation.

When the device is played in the room, it will produce a lot of echoes, and these echoes will cause interference.

Researchers will use deep learning and a series of technologies to eliminate echoes to ensure that every sentence heard by the machine comes from a human voice.

The second one is directional pickup.

There is a microphone array on the machine. When we wake up, it will recognize the position of the speaker and immediately turn around like cat ears to accurately capture the human voice.

At the same time, it also uses noise reduction to eliminate non-human sounds, such as the sound of the TV at home or people talking in the distance.

Alibaba GPT trained Niaoniao mouth substitute in 15 days, which is much more exciting than ChatGPT+Siri!

2. Tone

Tmall Elf Academy「 After hearing it, the next step is to make the sound more like a bird.

This is due to the acoustic model self-developed by DAMO Academy.

In the past, the process of customizing a person's voice was very complicated. It might take more than 20 hours of recording in a recording studio, a lot of manual annotation, and then model optimization. and deployment. It can be said that in the past, customized sounds were produced on an annual basis.

Moreover, after spending so much manpower and material resources, the sound that comes out is still obviously mechanical, and it is obviously a robot.

Alibaba GPT trained Niaoniao mouth substitute in 15 days, which is much more exciting than ChatGPT+Siri!

Now, using the customized solution of DAMO Academy KANN-TTS, only about an hour of effective bird recordings are collected, and they can be recorded with a mobile phone anytime and anywhere. It only took about a week from recording the sound to completing the training.

Moreover, the naturalness and personification of the final voice are surprising, very close to the timbre of Niao Niao himself.

And next, there will be an emotional timbre algorithm. If the bird is willing, the machine will make a passionate voice.

Alibaba GPT trained Niaoniao mouth substitute in 15 days, which is much more exciting than ChatGPT+Siri!

3. Writing style

Once the tone is learned, the next step is writing style .

There is a theory in psychology called the labeling effect. For example, when a person is labeled as an introvert, he may gradually become less talkative and make his behavior consistent with the label.

In the large model, a similar approach can be used to describe a person using personality label vocabulary.

During the experiment, some very interesting phenomena occurred.

When the model character is set to be a cheerful and humorous person, not only will he often laugh during dialogue, but if asked what movie he likes, he will also answer that it is a comedy.

After being labeled as depressed and mournful, the model lost interest in many things.

When the model is labeled as gentle, considerate, and considerate, it will mention family members more in the conversation. For example, when answering what it wants to do on the weekend, it will say I want to spend time with my family. .

Technically, there are two approaches.

The first one is called Plug&Play. In this case, the big model itself is still a general model, but a module will be used to identify the style, making it speak more like a bird.

The second method is to do prompts based on a large model, allowing it to learn the styles of different personality labels.

When training birds to distinguish birds, labels such as talk show actor, post-90s generation, Inner Mongolia native, deep, humorous, and introverted were used.

Alibaba GPT trained Niaoniao mouth substitute in 15 days, which is much more exciting than ChatGPT+Siri!

4. Dialogue

And this demo version is Cat Elf is a little different from past versions, which involves the difference in duplex dialogue.

In the past, the voices of humans and machines could not overlap. When a person asked a question, the machine would wait for the person to finish speaking before replying. When the machine replies, the person must wait for it to finish speaking before saying the next sentence.

With the support of full-duplex, machines can interact with people in both directions.

For example, when you talk to a machine, it will say "um", "let me think about it" and other continuous sentences.

In addition, if the machine is too talkative during the answer, you can interrupt at will. As soon as we speak, it will stop and listen.

Because the latency is very low, very close to the latency of real-person conversations, it is a more two-way interaction.

Alibaba GPT trained Niaoniao mouth substitute in 15 days, which is much more exciting than ChatGPT+Siri!

Change the "brain" of the smart assistant

Since ChatGPT came out, netizens from all walks of life have resorted to unprecedented tricks. I want to integrate ChatGPT into Siri.

For example, this guy uses this model to parse commands sent by humans, and then has Siri execute them on his behalf, building an invincible smart home system.

"My wife is driving home and is expected to arrive home in 15 minutes." "Okay, the lights outside will be turned on for her then."

After all, compared to ChatGPT, traditional voice assistants such as Apple’s Siri and Amazon’s Alexa are indeed weak.

In this regard, Microsoft CEO Nadella has a very vivid metaphor - "Stupid as a rock."

Alibaba GPT trained Niaoniao mouth substitute in 15 days, which is much more exciting than ChatGPT+Siri!

Unlike Google, which is almost desperate to catch up with ChatGPT, Amazon does not feel that it is falling behind.

More than a decade ago, Bezos excitedly outlined his expectations for Alexa on a whiteboard at Amazon headquarters. At that time, Amazon’s founders also had grand visions for a new voice-controlled computing platform—building a Star Trek computer that could talk, control spaceships, and solve mathematical puzzles.

But now, the vision has clearly failed. Despite selling hundreds of millions of digital devices with the built-in assistant, Alexa has fallen short of Amazon's goal of creating the next big tech platform. Bezos was willing to develop Alexa at all costs, even losing money.

The darling ChatGPT, which came out in November last year, shows that Alexa’s innovation has stagnated.

However, Amazon is using a very new way to welcome the new era. Alexa's language ability is not as good as that of a chatbot, and a chatbot cannot control smart home devices. So, what if the two are combined?

Join if you can’t beat it. In recent months, Amazon has been in contact with AI startups and is preparing to integrate technology like ChatGPT into Alexa.

So, wouldn’t it be stronger if we directly add a ChatGPT-like large model to the “native” IoT device?

Looking at it this way, if Tmall Genie can implement a new OTA interactive system on a large scale, it will indeed be very advanced.

Human-centered AI governance

Recently, bigwigs from all walks of life have been quarreling over whether to stop developing AI that is stronger than GPT-4. That's called a fight.

The focus of the debate is the safety of AI - the open letter believes that no one can understand, predict, and control these AIs, not even their creators.

Alibaba GPT trained Niaoniao mouth substitute in 15 days, which is much more exciting than ChatGPT+Siri!

## It’s not common for Musk and LeCun to confront each other head-on (tactical fallback)

In fact, the root cause of this phenomenon lies in:

1. The technical characteristics of AI enable it to have its own values, although they are different from those of human beings. The same, but it has long been out of the category of technology neutrality;

2. Another technical feature of AI makes it an interface to society, and any interface infrastructure that lacks supervision will leading to unfairness.

The question is, if AI is a black box, then how can we judge whether it is good?

Alibaba GPT trained Niaoniao mouth substitute in 15 days, which is much more exciting than ChatGPT+Siri!

In this regard, Yu Yang, a researcher from the School of Interdisciplinary Information at Tsinghua University, said that the answer lies in the audit and governance of AI. Currently, his team is also cooperating with the Tmall Genie team on research in the field of AI-ESG.

Currently, research in related fields focuses on people, and its purpose is to ensure that people can receive equal and fair treatment in the information age, especially the artificial intelligence era.

In order to achieve this, Professor Yu Yang’s team proposed an AI full life cycle governance audit method based on causal inference.

Alibaba GPT trained Niaoniao mouth substitute in 15 days, which is much more exciting than ChatGPT+Siri!

Specifically, through the analysis method of causal inference, the team found that the AI ​​model has actually included gender, Race tags are associated with occupations.

Some literature believes that if the encoding layer does not do this, the performance of the AI ​​model will decrease. The reason for this phenomenon is that the current method of correcting bias is to add some requirements to the reward function during training, and "slap it in the face" whenever the model is biased.

In contrast, if we tell it from the beginning that it cannot label people, then the final model will not only greatly reduce the risk of bias, but also be effective in some tasks. Performance has also improved.

In the final analysis, it is a matter of how to educate AI - beating and scolding alone is not enough, you must also reason with AI.

Alibaba GPT trained Niaoniao mouth substitute in 15 days, which is much more exciting than ChatGPT+Siri!

From this we can easily see that auditing can not only help find problems, but also enhance the transparency and explainability of AI. , and improve AI performance.

So for smart terminals equipped with large models, the importance of technical audit is self-evident. After all, judging from this technology demonstration, if one day we have our own unique large model like Bird does, it is not necessarily a fantasy.

The above is the detailed content of Alibaba GPT trained "Niaoniao" mouth substitute in 15 days, which is much more exciting than ChatGPT+Siri!. For more information, please follow other related articles on the PHP Chinese website!

This article is reproduced at:51cto.com. If there is any infringement, please contact admin@php.cn delete